Dual Isoform Sequencing Reveals a Multifaceted Transcriptional Architecture of a Prototype Baculovirus

In this study, we used two long-read sequencing (LRS) techniques, Sequel from the Pacic Biosciences and MinION from Oxford Nanopore Technologies, for the transcriptional characterization of a prototype baculovirus, Autographacalifornica multiple nucleopolyhedrovirus. LRS is able to read full-length RNA molecules, and thereby to distinguish between transcript isoforms, mono- and polycistronic RNAs, and overlapping transcripts. Altogether, we detected 875 transcripts, of which 759 are novel and 116 have been annotated previously. These RNA molecules include 41 novel putative protein coding transcript (each containing 5’-truncated in-frame ORFs), 14 monocistronic transcripts, 99 multicistronic RNAs, 101 non-coding RNA, and 504 length isoforms. We also detected RNA methylation in 12 viral genes and RNA hyper-editing in the longer 5’-UTR transcript isoform of ORF 19 gene. with Blunt/TA Ligase Master Mix (New England Biolabs). PCR reactions were carried out to amplify the adapter—ligated samples with KAPA HiFi DNA polymerase (Kapa Biosystems) and PCR barcodes (ONT). After PCR, samples were mixed and the second end-prep an adapter ligation were carried out according to the ONT’s 1D Strand switching cDNA by ligation method. Sequencing of the mixtures from the barcode labeled samples were performed on ONT R9.4 SpotON Flow Cells.


Introduction
The Autographa californica multiple nucleopolyhedrovirus (AcMNPV) is an insect virus belonging to the Baculoviridae family 1 4 . The AcMNPV genes are expressed in three phases: early (E), late (L) and very late (VL) 5 .
Early transcription [0 to 6 hours post infection (p.i.)] produces transcriptional activators 6 and the molecular machinery of DNA replication 7 . E genes are de ned as those transcribed by the host RNA polymerase II, which recognizes the TATA promoter elements located upstream of the transcriptional start site (TSS), or to the arthropod initiator element: CAGT 8 ; although some E genes lack a canonical initiator sequence or any recognizable promoter motif 9 . After a transitory early/late phase, some E genes cease to express, while others, supposedly due to the presence of both early and late promoters and/or initiators, are transcribed throughout the entire infection cycle 10 . The L phase starts at the onset of the genome replication (6 to 18 h p.i.). L and VL genes are transcribed by the viral RNA polymerase, which recognizes a consensus late initiator sequence (TAAG) on the DNA, and starts to synthesize the RNAs from the second nucleotide of the motif (underlined) 11,12 . In our previous study, we demonstrated that longer and shorter 5'-untranslated region (UTR) isoforms of a given late transcript starts from a late initiation sequence (LIS) as well 13 . VL gene expression (18 to 72 h p.i.) is characterized by the synthesis of the occlusion body proteins: polyhedrin and p10, and transcription factors like the very late expression factor 1 (VLF-1). VL genes contain an A/T-rich region 14 , called the 'burst sequence' downstream of their LIS, recognized by VLF-1, which facilitates their high expression 15 .
Capping of the viral RNAs is accomplished by both host and viral proteins: LEF-4 exhibits RNA 5' triphosphatase and guanylyl transferase activities 16 , whereas MTase (encoded by Ac69) methylates the guanosine in the cap structure 17 . Most of the AcMNPV transcripts contain a canonical polyadenylation signal (PAS) upstream their transcriptional end site (TES). PASs are recognized by the host cleavage and polyadenylation apparatus, which nicks the transcripts in their 3'-UTR regions and carries out a nontemplated addition of adenines. It has been shown 18 , that the viral RNA polymerase can also catalyze the poly(A) tail formation after transcribing uracil-rich regions, which can result in alternative terminations in the late transcripts 13 .
Early works leading to the discovery of the mRNA cap structure also detected a low level of internal 5methylcitosine (5-mC) in mammalian cells 19 , with further studies discovering the presence of methylated nucleotides in viral RNAs 20,21 as well. The 5-mC has been shown to be linked to the metabolic stability of tRNAs 22,23 and acts as a suppressor of translation when present at position 34 in eukaryotic tRNA Leu 24 .
The 5-mC present in rRNA may participate in tRNA recognition and peptidyl transfer 25 . In contrast, little is known of the function of methylation in mRNAs and non-coding RNAs (ncRNAs), especially those of the viruses. It has been shown that viral methylated RNA nucleotides can ablate the activity of dendritic cells 26 in mammals, reducing immune response for foreign RNAs. The 5-mC may also play a role in processing of ncRNAs into smaller fractions with regulatory potential 27 .
RNA editing consists of the C-6 deamination of adenine nucleotides by the host's adenosine deaminase acting on RNA1 (ADAR1) enzyme. The resulting inosine (I) is recognized as guanine (G) by the reverse transcriptase, producing a G mismatch during cDNA sequencing 28 . Hyper-editing is the same phenomena occurring on several adenines on the transcript. Hyper-editing may play a role in the cell's innate immunity 29 , while viruses can evade their inactivation and a strong immune response by the presence of hyper-edited sites on their RNA 30 . Hyper-editing may also be essential for the replication of some viruses 31 .
Next-generation sequencing (NGS) techniques provide massive amounts of highly accurate data on the structure and expression of genes 32,33 . However, because of the short read lengths produced by these platforms, they are ine cient in identifying transcript length isoforms, polycistronic RNAs, transcriptional overlaps, and characterizing gene expression 34 .
The Oxford Nanopore Technologies (ONT) and the Paci c BioSciences (PacBio) long-read sequencing (LRS) platforms can overcome these de ciencies by their ability of sequencing full-length RNA molecules 35 , using cDNA 36 or direct RNA 37 (only ONT) sequencing. The ONT MinION platform works by measuring the electrical current uctuations caused by the threading of a single-stranded polynucleotide through a nanopore xed on a synthetic membrane 36,38 . The ONT MinION technology has no theoretical upper limit regarding its read length, but at present falls short of its competitors with respect of base calling precision 39 . Its ability of processing full-length RNAs makes it an optimal choice for discovering novel transcripts and transcript isoforms in well-annotated genomes, in which case its error-prone base calling is not an issue 40 . The PacBio approach is based on Single-molecule, Real-time (SMRT) technology. The elongation of the DNA sequence is recorded as light impulses emitted when either of the Page 4/28 four uorescently labelled nucleotides is incorporated into the molecule. The single molecules are made circular by the addition of hairpin adaptors and therefore sequenced in multiple passes in both forward and reverse orientations.
Reverse transcription (RT) and PCR are inevitable cDNA library preparation steps for both NGS and thirdgeneration sequencing technologies. As discussed in our previous works 13,40,41 , both can lead to artifacts through template switching and false priming, which need to be considered during the analysis of novel TSSs and TESs. The direct sequencing of ribonucleotides with the ONT nanopore technology allows for the detection of modi ed and edited bases through the comparison of altered and canonical signals 37,42 , but this approach implies a known location of the modi cation or editing. This can be circumvented by generating models of the altered signals, and tting these to the unaltered signal 43 . At the time of writing only one model for detecting 5-methylcitosines was ready for public use, with others coming in the near future.
The structure of the AcMNPV transcriptome was already characterized in a study applying Illumina shortread sequencing (SRS) of the 5' and 3' ends 12 , and our work 13 using third-generation long-read cDNA and direct RNA sequencing. Other studies using microarray 44 , real-time PCR analysis 45 and Illumina SRS 12 focused on the characterization of transcriptional dynamics. The techniques used in these studies of gene expression are not well-suited to tackle the complex structure of the baculovirus transcriptome, omitting most of the overlapping transcript isoforms. The aims of this work were to update the AcMNPV transcriptome using a dual LRS approach, and to detect RNA methylation and editing using ONT sequencing.

Analysis of AcMNPV transcriptome using third-generation sequencing
In this study, we used PacBio Sequel powered by Single Molecule, Real-time sequencing (SMRT) technology and ONT MinION LRS platforms to characterize the structure of AcMNPV transcriptome ( Fig. 1). Our earlier data obtained by PacBio RSII system were also included in the analysis. The Sequel sequencing yielded a total of 47,880 Circular Consensus Sequences (CCS) of which 25,371 mapped to the viral genome, and 23,884 to the insect host (Sf9 cells) genome. The total read count is less than the sum of the two mapped read counts because of chimeric reads formed during library preparation mapped to both of the genomes. The cap-selected samples yielded a total of 6,862,026 reads of which 198,516 mapped to the AcMNPV genome, and 1,631,960 to the host genome, while the non-cap selected samples yielded 1,119,716 reads, 290,039 mapping to the viral genome and 760,533 to the host. Sequel sequencing yielded longer mean mapped read lengths than that ONT, while the cap-selected and non-cap selected ONT reads had similar mapped read lengths ( Table 1). The difference in mean read-length between the two platforms can be explained by a step during Sequel library preparation used to mitigate the loading bias of PacBio sequencers, resulting in the loss of short cDNAs. Further details on read counts and read lengths can be found in Supplementary  Table 2C). TATA-boxes were identi ed for 60 TSSs. The mean distance of a TATA box from the TSS is 32 nt. Twenty-two GC-boxes were identi ed, and their average distance from the TSSs are 66 nt. The average distance of the identi ed 15 CAAT-boxes from the TSSs are 108 nt.
The canonical CAGT initiators were present in only 6% of TSSs, the TAAG initiators were found in 61%, while the non-TAAG initiators were present in 33% of the cases ( Fig. 2A). We detected a total of 875 transcripts ( Table 2). The full transcript list is available in Supplementary Table 3A, the abundance of transcripts is available in Supplementary Fig. 2 and Supplementary Table 3B, and the transcript themselves are depicted in Fig. 3. surrounding of the viral TESs was characterized by A/U-rich sequences with an increased adenine content immediately upstream of the cleavage site. Intriguingly, sequences harboring a PAS showed a slight increase of adenines between − 26 and − 12nt upstream from the TES, whereas those without a PAS did not (Fig. 2B).

Length isoforms
We found that 330 transcript isoforms have longer or shorter 5'-UTRs than the previously annotated transcript encoded by the same genes. Less than half (32.06%) of these show an E/L initiation region shift when compared to the initiator element (Inr) of their previously annotated transcript. We found that 7,06% of TSS transcript isoforms are controlled by TAAG-Inr in those previously annotated isoforms that were controlled by non-TAAG-Inr and encoded by the same gene.
On the contrary, we identi ed 25% of non-TAAG-Inr TSS transcript isoforms in those isoforms, which previously were annotated as TAAG-Inr isoforms. This phenomenon suggests that several AcMNPV genes are transcribed by both the host and the viral RNAP, resulting in altered 5'-UTR lengths. In addition to those described in a previous work 12 , we detected 57 genes which are transcribed by both the host and viral RNAP (Supplementary Table 4A). The length polymorphism of 5'-UTRs probably have biological relevance, but we cannot exclude that represent mere transcriptional noise. In many cases, longer 5'-UTRs harbor upstream ORFs (uORFs), which have been shown to alter the translation of the protein coding sequence by ribosome reinitiation, ribosome stalling or disassociation and ribosome bypass 46,47 . We identi ed 75 gene products containing at least one uORF (Supplementary Table 4B) In this work, we identi ed 340 novel TES transcript isoforms of which 76.35% contains a canonical PAS upstream of their TESs. The phenomenon of nontemplated adenine-addition by the viral RNAP has previously been demonstrated 18 . This in-vitro study has also suggested a T-rich termination signal for this enzyme, and nontemplated thymine addition preceding adenine incorporation. In concordance with this work, we found that 51.85% of the 3'-UTR isoforms with a LIS terminates in the near vicinity (± 3nts from the TES) of a T-rich region, which in contrast to the 22.51% of the 3'-UTR isoforms with non-TAAG-Inr. However, we could not con rm the presence of nontemplated thymines upstream of the poly(A) tail.
The mean read length of transcripts is 1423.7nt (σ = 913.190). Intriguingly, RNAs transcribed by the viral RNA polymerase (RNAP) is on an average 500nt longer than those transcribed by the host polymerase. The mean 5'-UTR length was 153.06nt (σ = 270.438), while the mean 3'-UTR length was 529.09nt (σ = 729.266), both measured from the rst ORF overlapped by the transcript. The difference is signi cant for transcript length and 3'-UTR length, suggesting the tendency of viral RNAP to produce longer RNA molecules.

Monocistronic transcripts
Several AcMNPV genes lack a precise transcript annotation due to the challenges facing SRS when assembling a genomic region with a complex transcriptional overlapping pattern. Using LRS, we annotated 14 novel monocistronic transcripts with base-pair precision (Supplementary Table 3A). Canonical TATA boxes were observed upstream of the TSSs of ORF85 and ORF112-113. These transcripts start from a non-TAAG-Inr and harbor a canonical PAS upstream their TES. The transcripts coding for DNA polymerase is initiated at a canonical arthropod initiator (GCATA) while helicase at a similar but non-canonical sequence (GCAATA). Both DNAPOL and HEL harbor a canonical PAS upstream of their TES. Nine of the transcripts (ORF1629, P47, ORF72, ORF84, 38K, ORF108, PP34, P49 and ORF154) start at a TAAG-Inr, PP34 with a canonical CAAT sequence (CCAATC) 87nts upstream its TSS. Five of these transcripts show a PAS.

Non-coding transcripts
We detected one hundred one novel transcript isoforms that did not contain any previously annotated ORFs, two-thirds being longer than 200nt representing long non-coding RNAs, while one-third fall in the size range of short non-coding RNAs. Fifty-one genes are overlapped by a sense ncRNA. Only 10.3% of the non-coding isoforms have a canonical TATA promoter upstream their TSS, while 70.5% of them start at a TAAG initiator suggesting their late transcription. All of the sense ncRNAs are formed by the truncation of either of the 5'-UTR or of the 3'-UTR region of a previously annotated or novel monocistronic transcript. We detected 23 antisense RNAs encoded by the complementary DNA strands of 11 genes (Supplementary Table 3A).
Putative 5'-truncated mRNAs with in-frame ORFs We detected 41 putative novel genes, which produce 5'-truncated version of the canonical mRNAs and contain shorter in-frame ORFs. Nineteen of these transcripts are initiate at TAAG sequence, 9 of which has a previously annotated isoform (EGT, DNAPOL, HCF1, PNK/PNL, HEL, HE65, 94K, IE1 and IE2) initiated at a non-TAAG-Inr, which suggests that early genes are partially transcribed by the viral RNAP at late time-points. Intriguingly, eleven of the previously annotated transcripts (AC-BRO, POLH, ORF19, PP31, ORF66, ORF84, ODV-E25, BV/ODV-C42, ORF117, CHIT, ODV-EC27) starting at a TAAG sequence have a 5'truncated isoforms that are initiated at non-TAAG-Inrs. This implies the transcription of 5'-truncated isoforms of some late genes by the cellular RNAP.

Multigenic transcripts
Several very long multigenic RNA molecules were detected in the viral transcriptome. We designated polycistronic transcripts those ones, which exclusively contain tandem ORFs, whereas complex transcripts were de ned as those multigenic transcripts that contain at least one ORF in the opposite direction than the rest of the ORFs. In this study, we detected 241 polycistronic transcripts containing at least two ORFs. The main initiator motif of these long transcripts is the LIS, as 81.74% of the transcripts start at a TAAG sequence. A total of 79 complex transcripts were detected, of which 21 are transcript isoforms, while the rest of them are transcripts with unique locations. The longest complex transcript P10-74-ME53-C-1 has only a single sense and two antisense ORFs, while ORF51-52-53-LEF-10-ORF54-55-56-C-1 have the highest number of ORFs (6 sense and 1 oppositely oriented ORFs).

Replication origin-associated transcripts
The homologous repeat (hr) regions are located in multiple genomic positions in AcMNPV. They are believed to contain the replication origins (Ori). Our LRS approach detected overlapping transcriptional activity at all of the 9 hr sequences. However, in the case of h5, LoRTIA did not identify transcripts. Despite this fact, we could detect reads without exact TSSs and TESs. Altogether, 55 transcripts were detected at the hr regions, of which 50 contain TAAG initiator sequence. Fifteen of these RNAs are multicistronic, 32 are TSS variants and 3 are TES isoforms, while 8 are monocistronic transcripts. Most of the overlapping transcripts (12) are transcribed at the genomic junction (at hr1) of the circular viral DNA, of which 7 are complex transcripts, 4 are TSS isoforms and 1 is a monocistronic RNA.

Splice isoforms
Chen and colleagues 12 previously reported twelve introns with an abundance above 1%. We detected ve additional introns ( Table 3). Twelve of the introns found in this study are nicked at the canonical GT/AG splice junctions, while 1 at a less common GC/AG. Chen and colleagues associated a spliced antisense transcript to ORF115. We detected 2 similarly positioned RNAs (ORF117-L-SP-1 and ORF117-L-SP-2), ORF117-L-SP-1 having matching introns with the previously annotated transcript, while the other isoform has the same acceptor position. However, its donor site is located 85nts downstream from the previously annotated position. The TSS of these transcripts could not be annotated, however according to our data it is located upstream of ORF117-L-1's TSS. Splicing of the ORF117-L-SP-2 results in frame shifting of the previously annotated ORF. A novel 246nt-long ORF, started upstream of the previously annotated ATG is formed.  Table 5). We detected 32 divergent transcriptional overlap out of the 34 gene pairs, and 84 parallel overlaps out of the 87 gene pairs. We assume that a higher data coverage would detect overlaps in every transcript.

5-mC methylation
We used dRNA-Seq data for the detection of methylated nucleotides of AcMNPV transcripts using the Tombo software. 43 In order to decrease false positive results, we ltered out transcripts with a coverage lesser than 30, and those ones of which the modi ed fraction was less than 30%. We found no signi cant correlations between the coverage and raw fraction of methylated nucleotides (Fig. 4A). We identi ed a possible methylation consensus sequence (UUAC*CG) (the modi ed C letter is labelled with asterisk), which indicates the good distribution of log-likelihood ratios (Fig. 4B). The deviation from the canonical C sites can also be clearly detected (Fig. 4C). After ltering out the potential false positive sites, we obtained 319 putative 5-mC methylation positions in 12 viral genes (ac-39k, ac-bro, ac-ctl, ac-odv-e25, ac-orf-58, acorf-73, ac-orf-74, ac-orf-75, ac-p40, ac-p6.9, ac-polyhedryn, and ac-vp39).

A to I RNA hyper-editing
Reads of ORF19-L show a high frequency of A to I (read as A to G by the sequencing) substitution, which is not present in overlapping reads. We found that 50% of all substitutions are A to G ( Fig. 5A and 5B) for ORF19, which is signi cantly higher than the 16.9% for overlapping transcripts in the same region (p < 0.0001, to sided Fisher's exact test) (Fig. 5C). A substitution threshold of 16.9% was set to distinguish possible edited bases from the noise of sequencing inaccuracy. Our results show that 18% of all adenines of ORF19-L2 present a high level (x̅ =0.839, σ = 0.153), while 4% of adenines of the overlapping reads present a low level of A to G editing (x̅ =0.224, σ = 0.051). To identify the presence of a possible editing motif recognized by ADAR, we calculated the base frequency in the ± 5nt surrounding the edited A. It has been previously demonstrated that a G-enriched neighborhood and an upstream U stabilizes the RNA-ADAR complex in mammalian cells 48 . We could detect a signi cantly higher frequency of Us (χ 2 (1, N = 79,455) = 79,338.023, p < 0.01) right upstream the edited base, while the frequency of Gs was only slightly higher (χ 2 (1, N = 79,454) = 79,340.021, p < 0.05) at the + 5 position downstream of the edited base.

Discussion
The standard next-generation sequencing techniques are limited by short read length, because the fragmented sequences have to be re-assembled computationally, during which several valuable information on the transcriptome is lost. LRS is particularly useful for the analysis of nested transcripts and alternatively spliced transcripts. In this study, we applied two LRS techniques, SMRT Sequel platform from PacBio and MinION platform from ONT for pro ling the AcMNPV transcriptome. We carried out ampli ed and direct RNA sequencing on ONT platform, and also used our earlier PacBio RSII data 13 . Altogether, we identi ed 876 novel transcripts, including mRNAs, ncRNAs, mono-and multicistronic transcripts, transcript isoforms, and novel splice sites.
A stepwise truncation on both ends of the transcripts can be observed in several genomic regions. This has been shown to be present in other viruses 49 as well, however many studies can confuse this phenomenon with RNA degradation or PCR artefacts, especially if its present on the 5'-end of the reads. The AcMNPV offers a unique support of the existence of these variable length isoforms by the presence of a LIS located at the TSS. The same TSSs and TESs are used by multiple transcript isoforms of neighboring, or in some cases, distant genes, resulting in polycistronic and complex transcript isoforms. This organization of the transcriptome, especially the intensive usage of one TSS for multiple transcript isoforms is uncommon in model organisms such as human herpesvirus type 1 (HSV-1) 50 , however we observed a somewhat similar TSS usage in the African swine fever virus (ASFV) 51 , which is related to insect viruses, but not baculoviruses.
Several multicistronic transcripts were detected in AcMNPV. Polycistronism represents the basic organization principle of prokaryotic genome but it is rare in eukaryotes for which the reason is that, while in bacteria the Shine-Dalgarno sequences allow the translation of every gene in the mRNA 52 , in eukaryotes only the most upstream gene of a multigenic transcript is translated because of the Capdependent initiation system. However, polycistronism is very common in eukaryotic viruses 53 . The function of these multigenic transcripts are currently unknown because we have no evidence for the translation of downstream genes. It is hypothesized that transcriptional readthrough in tandem genes (and also on convergent genes) plays a role in a transcription interference-based mechanism 54 .
We observed that many of the longer TSS isoforms contain uORFs in their 5'-UTR, which may play a role in the regulation of translation 46,55 . These transcripts are also involved in the formation transcriptional overlaps. AcMNPV resembles to ASFV and vaccinia virus and differs from herpesviruses in that it exhibits higher heterogeneity in their TESs than TSSs. The alternative use of 3'-UTRs generates long tail-to-rail and tail-to-head transcriptional overlaps. This part of the transcripts may contain cis-regulatory elements, which can bind to regulatory proteins or micro RNAs thereby controlling the translation and the decay of mRNAs 56 .
In this study, we detected novel promoters, Inr sequences and poly(A) sequences. Additionally, we identi ed TAAG-Inr motifs, which bind viral RNAP at late and very late phase of viral life cycle, and non-TAAG-Inr motifs recognized by both viral and host RNAPs at early time points. Our results clearly demonstrate that viral RNAP generates longer transcripts than the host RNAP. The 3' cleavage of the viral RNAs and the formation of a poly(A) tails is carried out by polyadenylation machinery of both the host and the virus, although the latter is not well understood.
AcMNPV contains 9 AT-rich repetitive sequences (hr regions), which are thought to be replication origins 57,58 . However, others have demonstrated that none of them is essential for the replication 59 . We detected overlapping transcription from each hr region. They are assumed to play a role in the regulation of replication 60 . Such transcripts have been identi ed in other viruses, such as herpesviruses 61 .
We describe several 5'-truncated RNA molecules containing nested in-frame ORFs. This phenomenon has been described in other viruses 51,62 . It has to be determined whether these transcripts carry the information of N-terminally-truncated polypeptides. If so, this kind of nested transcription signi cantly increase the coding potential of viruses. In this work, we detected a large number of low-abundance transcript isoforms. Their potential functional signi cance has to be ascertained.
We detected 5-mC methylation in 12 AcMNPV transcripts and identi ed the UUACCG sequence, which may be a methylation consensus sequence. Yang and coworkers have demonstrated that 5-mC nucleotides enhance the nuclear export via the ALYREF adapter protein in mammalian cells 63 . Boyne and colleagues came to the same conclusion in Kaposi's sarcoma-associated herpesvirus 64 . The ALYREF is also present in Arthropods. We detected A to I hyper-editing in in the longer TSS isoform of ORF19 gene.
In cellular organism, this process play an important role non-speci c immunity 29 , which is unlikely the case in AcMNPV. The A-I editing is thought to decrease the a nity of antisense transcripts to the complementary mRNA through inhibiting the binding of dsRNA nucleases (such as RNase) 65 . As dRNA-Seq-based methylation and editing detection is still in its infancy, and our result need further con rmation.

Cells and viral infection
The AcMNPV expressing the lacZ gene (βgal-AcMNPV) was propagated on the Sf9 cell line (both kindly provided by Ernő Duda Jr., Solvo Biotechnology, Hungary). Cells were cultivated in 200ml of GIBCO Sf-900 II SFM insect cell medium (Thermo Fisher Scienti c) in a Corning spinner ask (Merck) at 70 rpm and 26°C, and were infected with a viral titer of 2 multiplicity of infection (MOI = plaque-forming units per cell).

RNA puri cation
Total RNA was isolated using the Nucleospin RNA Kit (Macherey-Nagel) according to the manufacturer's instruction. In short, infected cells were collected by centrifugation and the cell membrane was disrupted by the addition of lysis buffer (provided in the kit). Genomic DNA was digested using the RNase-free rDNase solution (supplied with the kit). Samples were eluted in a total volume of 50µl nuclease free water. To eliminate residual DNA contamination, samples were treated with the TURBO DNA-free Kit

Data analysis and alignment
Barcoded reads were demultiplexed (ONT's software) in to 9 separate time points and an additional category. Reads with a > 7 quality score of both the cap selected and the demultiplexed barcoded datasets were aligned to the circularized genome of AcMNPV strain E2 (GeneBank accession: KM667940.1) and the host cell genome (Spodoptera frugiperda isolate Sf9; BioProject accession: PRJNA380964) using Minimap2 v.2.11 67 .
We used SeqTools, our in-house scripts for the generation of the descriptive quality statistics of reads (ReadStatistics) and for the analysis of promoters (MotifFinder), which are available on GitHub: https://github.com/moldovannorbert/seqtools.
In this study, the LoRTIA (https://github.com/zsolt-balazs/LoRTIA, v.0.9.9) pipeline developed in out laboratory, was used for the identi cation of transcripts and transcript isoforms, as was described earlier 66 . Brie y, sequencing adapters and the homopolymer A sequences were checked by the LoRTIA software for the detection of TSS and TES, respectively. For the elimination of false transcript ends, the putative TSSs and TESs were tested against the Poisson distribution (using Bonferroni correction).
Introns were identi ed by applying the following criteria: they have one of the three most frequent splice consensus sequence (GT/AG, GC/AG, AT/AC), and their frequency exceed 1‰ compared to the local coverage (Table 5). For transcript isoform annotation those TSSs and TESs were selected, which were present in at least two samples, while introns were selected if they were present in at least two samples and if their orientation matched the orientation of reads in which they were present, as the LoRTIA software is blind for the orientation of the reads when looking for introns. Transcript isoforms were annotated for each sample using these features and the Transcript Annotator module of LoRTIA.
A read was considered a transcript isoform, if it started in the ± 5nt vicinity of a TSS and if it ended in the ± 5nt vicinity of a TES. Transcripts enclosing the same ORFs as a previously annotated transcript, but starting upstream of its TSS were denoted longer (L) 5'-UTR isoforms while those starting downstream shorter (S) 5'-UTR isoforms. Transcripts with the same ORFs as a previously annotated transcript, but ending upstream or downstream of its TES were denoted transcript isoforms with alternative termination (AT). Transcripts with longer 5' or 3'-UTRs overlapping multiple ORFs in the same orientation were considered polycistronic. If a TSS of a novel transcript isoform was positioned downstream of a previously described ORF's AUG, with an alternative in-frame start codon downstream from the TSS, the isoform was considered putative protein coding transcript, while those without a 5' truncated ORF were considered 5' truncated (TR) non-coding transcripts. Both of these transcripts are conterminal with their previously annotated isoforms. If a transcript isoform started in the same TSS as a previously described protein coding transcript, bit its TES was located upstream of the previously described ORF's stop codon, the novel transcript was denoted as non-coding (NC). Transcripts in the opposite orientation of an annotated transcript were named non-coding antisense (AS) transcripts. Very long transcripts overlapping multiple ORFs in different orientation were denoted as complex (C) transcripts. Any other transcript con guration not containing a previously annotated ORF was denoted as non-coding (NC).

Data availability
The sequencing data and the transcriptome assembly have been uploaded to the European Nucleotide Archive under the project accession number PRJEB25619 for samples at separate time points and PRJEB24943 for the mixed and Cap-selected samples.