Long-read Time-course Proling of the Host Cell Response to Herpesvirus Infection Using Nanopore and Synthetic Long-Read Transcriptome Sequencing

Third-generation sequencing is able to read full-length transcripts and thus to eciently identify RNA molecules and transcript isoforms, including transcript length and splice isoforms. In this study, we report the time-course proling of the effect of bovine alphaherpesvirus type 1 on the gene expression of bovine epithelial cells using direct cDNA sequencing carried out on MinION device of Oxford Nanopore Technologies. These investigations revealed a substantial up- and down-regulatory effect of the virus on several gene networks of the host cells, including those that are associated with antiviral response, as well as with viral transcription and translation. Additionally, we report a large number of novel bovine transcripts identied by nanopore and synthetic long-read sequencing. This study demonstrates that viral infection does not lead to a change in the average distance between promoters and transcription start sites, and between polyadenylation signals and transcription end sites. However, it causes differential expression of transcript isoforms. We could not detect an increased rate of transcriptional readthroughs as described in another alphaherpesvirus. According to our knowledge, this is the rst report on the use of LoopSeq for the analysis of eukaryotic transcriptomes. This is also the rst report on the application of nanopore sequencing for the kinetic characterization of cellular transcriptomes. This study also demonstrates the utility of nanopore sequencing for the characterization of dynamic changes of transcriptomes in any organisms. We our (GO) biological and molecular functions analysis these clusters we distinguished three functional groups. Genes basic cell functions, including morphogenesis, cell cycle regulation, signaling, catabolic pathways and respiration, are generally downregulated during viral infection. On the other hand, we observed a considerable upregulation of genes involved in antiviral response. Additionally, genes playing a role in transcription, RNA decay, translation and protein folding were also upregulated. Our analysis shows that most of these genes are associated to distinct molecular functions and biological processes indicating general response to virus infection. However, the rest of the unassociated genes could also be associated with either susceptibility to or defense against viral infection. We also identied a small set of immediate response genes that exhibited signicantly altered expression 1 hour after viral infection.


Introduction
Bovine alphaherpesvirus type 1 (BoHV-1) is a large DNA virus belonging to the Alphaherpesvirinae subfamily. This virus infects cattle and causes the disease commonly known as bovine respiratory disease, which leads to severe economic losses annually worldwide (van Oirschot, 1995). Like other alphaherpesviruses, such as herpes simplex virus type 1 (HSV-1), or pseudorabies virus (PRV), BoHV-1 also enters a latent state most commonly in the trigeminal ganglia following primary infection (Jones, 1998). From this state, the virus can be reactivated by various types of stress and can re-establish an acute infection (Nataraj et al., 1997).
Short-read sequencing (SRS) technology has expanded the frontiers of genomic and transcriptomic research due to its capacity to collect vast quantities of sequencing data at a relatively low cost. However, the past decade has witnessed incredible advances in long-read sequencing (LRS) technology. Besides the Paci c Biosciences and Oxford Nanopore Technologies platforms, Loop Genomics has recently also developed an LRS technique based on single molecule synthetic long-read sequencing (LoopSeq). LRS approaches present a strategy that is able to elude the limitations of SRS, including its ineffectiveness in the identi cation of transcript isoforms and in distinguishing overlapping RNA molecules. Recently, LRS techniques have been widely applied for the transcriptome analysis of a variety of organisms (Byrne et al., 2017;Chen et al., 2017;Tombácz et al., 2018a;Boldogkői et al., 2019;Zhao et al., 2019), including herpesviruses (Balázs et  These studies have uncovered a far more complex transcriptional landscape of the examined species than previously thought. Genome-wide sequencing assays have annotated the global transcriptome of BoHV-1 (Moldován et al., 2020), including microRNAs (Glazov et al., 2010). The effect of herpesvirus infection on host cell transcription using SRS (Illumina HiSeq) has been characterized by (Hu et al., 2016). In this paper, the authors described alternative splicing and polyadenylation in human skin broblast cells due to the infection by HSV-1.
In this work, we carried out a time-lapse assay for the examination of the effect of

Pre-processing and data analysis
The MinION data was base called using the Guppy base caller v. 3.4.1. with --qscore_ ltering. Reads with a Q-score greater than 7 were mapped to the host genome [Bos taurus Gene Bank accession GCF_002263795.1] using the Minimap2 aligner (Li, 2018).

Analysis of host cell gene expression
In order to assess the effect of the infection on host gene expression, we excluded MAPQ=0, secondary and supplementary alignments from all downstream analysis. The reads aligned to the host genome were associated to host genes according to the GCF_002263795.1_ARS-UCD1.2_genomic.gff genome coordinates. Only reads matching the exon structure of the host reference genes (using a +/-5 base pair window for matching exon start and end positions) were counted. We used edgeR_3.24.3 (McCarthy et al., 2012) with R version 3.5.1 for differential expression (DE) analysis, and ltered out host genes with less than ten reads in any of the three biological replicates. Since we had mock, 1h, 2h, 4h, 6h, 8h, 12h measurements, we used the GLM model (robust=True) and the TMM normalization method in the edgeR analysis. In our model, we tested for DE against mock expression for each time point using data from three biological replicates. To detect genes with signi cantly changed expression levels, we applied a 0.01 false discovery rate (FDR) threshold, with p-values adjusted by the Benjamini & Hochberg procedure.
Medians of normalized pseudo-counts of DE genes were exported from edgeR (Supplementary table S1). Gene expression levels were normalized to maximal expression levels and were then compared to each other to reveal which genes had similar expression kinetics during viral infection. Genes were clustered by their relative expression pro le using the amap_0.8-16 R package Kmeans function with the Euclidean distance method. Based on the Calinski criteria, our dataset had an optimal cluster number of 6. Using the identi ed subset of genes, we performed overrepresentation analysis for each cluster using the number of expressed genes as reference via the PANTHER (version 14.1 using the 2018_04 dataset release) (Mi et al., 2013) software tool. We summarized the results of our over-representation analysis (FDR<0.05) using the Gene Ontology (GO) biological processes and GO molecular functions annotation datasets.
Schematic representation of the work ow is shown in Supplementary Figure S1.

Annotation of Bos taurus transcripts
In this work, we applied the following techniques for the analysis of bovine transcriptome: (1) direct cDNA sequencing (dcDNA-Seq) based on oligo(dT)-primed reverse transcription (RT), (2) ampli ed cDNA sequencing based on random-oligonucleotide-primed RT using nanopore sequencing on ONT MinION platform, as well as (3) synthetic long-read sequencing (LoopSeq) on Illumina platform. All of the three techniques were used for bovine transcript annotation, whereas dcDNA-Seq was used for the time-varying analysis of the effect of BoHV-1 on host cell gene expression. For transcript detection and annotation, mapped reads were analyzed using the LoRTIA software suite developed in our laboratory (https://github.com/zsolt-balazs/LoRTIA).
For the annotation of introns, transcription start sites (TSSs), and transcription end sites (TESs), we set the criterion that these sequences have to be identi ed by the LoRTIA suit in at least two independent bovine cell samples. With this restriction, we identi ed altogether 11,025 TSSs, 21,317 TESs and 139,771 introns (Supplementary Table S2). Additionally, LoRTIA produced a total of 227,672 bovine transcripts (Supplementary Data Item S1). The median length of these transcripts was 1,678 nt (σ =2,386.5).
Three biological replicates were prepared for each time-point in dcDNA sequencing used for the timelapse experiment. Seven time points post infection (p.i.) and a mock-infected sample was used in each replicate for this part of the analysis ( Supplementary Fig. S1).
We identi ed consensus TATA boxes at a mean distance of 31.15 nt (σ = 2.96) upstream of bovine TSSs.
The polyadenylation signals (PASs) were located at a mean distance of 25.35 nt (σ = 8.26) upstream of the host TES. Our data show that viral infection does not induce signi cant changes in the distance between promoters and TSSs as well as between PASs and TESs ( Figure 1A and 1B). No signi cant modi cation was found in the sequence of the ±5 nt surrounding region of the TSS and the ±50 nt surrounding region of bovine gene TESs during the infection ( Figure 1C and 1D).
To assess changes in splicing, and the usage of TSSs and TESs of the host cell during BoHV-1 infection, we evaluated transcripts represented by more than ten reads in the infected samples (n = 69,726) reported by LoRTIA. We detected altogether 130 alternatively spliced transcripts ( Figure 2A).
FOS, an immediate responder of the stress signaling pathway, is quickly degraded if its third intron is retained (Jurado et al., 2007). We detected a non-spliced variant of FOS in very low abundance and additional splice variants of the transcript lacking the above-mentioned exon, which were present starting from the rst hour of the infection ( Figure 2B). This con rms previous reports on the presence of FOS in the early stages of viral infections (Rubio and Martin-Clemente, 1999;Hu et al., 2016).
The 3'-UTRs of genes often contain miRNA targets, contributing to mRNA degradation. Thus, shorter 3'-UTR length can lead to increased transcript stability (Mayr and Bartel, 2009), whereas longer 3'-UTRs can be targeted by several miRNAs and other trans-acting elements thereby generating distinct regulation patterns (Pereira et al., 2017). We detected 72 transcripts with TESs located further downstream and 122 transcripts with TESs located more upstream compared to transcripts in mock samples. Superoxide dismutase 1 (SOD1) confers protection against oxidative damage (Miao and St. Clair, 2009), including that induced by the IFN-I signaling (Bhattacharya et al., 2015). A 3'-UTR isoform of SOD1 detected in infected cells was shorter than that of found in the mock sample ( Figure 2C).
A previous work reported the disruption of transcript termination in the host caused by HSV-1 infection, resulting in extensive transcriptional overlaps between adjacent gene products (Rutkowski et al., 2015). According to our results, the length of polyadenylated transcripts remained constant during the infection (Figure 3). In order to investigate whether disruption of transcript termination also occurs in BoHV-1infected bovine cells and results non-polyadenylated transcripts, we carried out ONT sequencing based on random oligonucleotide-primed RT, and the obtained dataset was used for the analysis of transcription activity at the intergenic regions. Despite this library yielding a comparable measure of reads mapping to Bos taurus (n=2,222,987), we were unable to detect any substantial amount of fragments mapping to the intergenic regions.
Using LRS, we were able to differentiate between TSS isoforms. We detected 80 transcripts with upstream and 142 with downstream TSSs.

Overall host cell gene expression during the 12 hours of virus infection
This study investigated the effect of viral infection on the cultured bovine cells by a time-course transcriptome analysis using ONT LRS analysis. We carried out direct cDNA sequencing using three biological replicates in each of the six time-points (1h, 2h, 4h, 6h, 8h, 12h) and in the mock-infected sample. We identi ed a total of 8,342 host genes that produced more than ten transcripts in each of the three biological replicates. Applying differential expression (DE) analysis with a 0.01 false discovery rate (FDR) threshold, we identi ed 686 genes among the 8,342 host genes that exhibited signi cantly altered expression levels during the course of virus infection. Genes were clustered by their expression pro le and not by their absolute expression levels. In this part of the analysis, we transformed the time series of expression levels to a relative scale representing the expression changes between sampling points. This allowed to cluster the genes by their expression pro les during the course of viral transfection instead of their absolute abundance. We identi ed six clusters of genes with distinctive expression pro les ( Figure  4A and 4B and Supplementary Table S3). By analyzing mean expression pro les of gene clusters, we identi ed four groups of genes (clusters 2-5) that were constantly upregulated, a single group of genes where expression levels were steadily downregulated throughout the entire period of virus infection (cluster 6), and nally, one group that showed initial upregulation followed by downregulation (cluster 1).
We performed an over-representation analysis using the 8,342 genes as reference with the PANTHER software tool. We summarized the results of this analysis using GO (Gene Ontology) biological processes and GO molecular functions annotation datasets in Supplementary Table S3 (an FDR<0.05 was used). Over-represented genes were categorized into six functional groups according to the GO database ( Figure   4C) as follows: 296 genes play a role in cellular metabolism, 257 are involved in transcription and RNA decay, 242 in developmental and morphogenetic processes, 187 in immune response and host defense, 161 in translation and protein folding, whereas 61 genes are speci cally associated with in viral transcription related processes.
Genes of the rst cluster (n=53) had medium expression preceding the infection (which was transiently slightly upregulated at the 1h, and 2h p.i. time points) followed by downregulation at later measurements.
Genes in this cluster were over-represented in pathways controlling a wide variety of developmental and morphogenetic processes. Several genes coding transcription regulatory proteins present in this cluster show diminishing expression throughout the infection. Genes involved in the cytokine regulation of the immune response and in ammatory processes are also affected. The second and third cluster of genes (n=64, n=82 respectively) had medium expression preceding the infection that rose at each consecutive time points. The genes of these clusters were over-represented in functions and molecular processes that can be associated with viral gene expression and the virion assembly. An upregulation of genes involved in transcriptional and translational processes, as well as RNA decay was also observed. RNA decay can be an immediate response of the host cell to counteract the accumulation of viral transcripts, or it may be an effort of the virus to eliminate competing host mRNAs in order to facilitate the translation of viral transcripts (Smiley, 2004;Moon and Wilusz, 2013). Some of the over represented genes in these cluster are the members of GO molecular functions that have overlapping sets of genes. For example, the 12 genes (RPS26, RPL5, RPL30, RPS29, RPL31, RPS6, RPL36, RPL37, RPL8, RPS10, RPS21, RPL19) that were signi cantly upregulated during infection are the members of both the "viral transcription" and the "SRP-dependent co-translational protein targeting to membrane" pathways. Many of these genes are also members of the "nuclear-transcribed mRNA catabolic process, nonsense-mediated decay" pathway.
Genes in the fourth cluster (n=64) had low relative expression preceding the infection. These genes were upregulated following a sigmoid curve during the infection. Genes in the fourth cluster were not signi cantly over-represented in any of the GO molecular functions or GO biological processes. The fth cluster of genes (n=88) had zero or negligible expression preceding infection but showed an exponential increase in expression during the course of infection. The over-represented genes of this cluster were associated with anti-viral cellular and defense responses such as the type I interferon signaling pathways. The sixth cluster of genes (n=335) consisted of a huge variety of host genes with high expression preceding viral infection that showed sharp downregulation during the infection. These genes were over-represented in pathways associated with protein folding, cell cycle regulation and mitochondrial processes including aerobic respiration.

Key response host genes
We performed DE analysis (with FDR=0.01) on the Mock and 1-hour expression values to describe the immediate response of host cells. We identi ed 6 bovine genes that were signi cantly down regulated and 19 genes that were signi cantly up regulated in the three biological replicates (Supplementary table  S1). Over expression analysis revealed no signi cant association to either of the GO biological processes or GO molecular functions using the subset of up and down regulated genes or the whole set of genes. However, STRING association analysis revealed 4 networks between these genes. The rst gene network (GADD45B, GADD45A, DDIT3, ATF3, IFRD1, CARM1, SQSTM1) contains genes that are associated to host DNA damage response, transcription regulation. Furthermore, one gene plays a role in selective automacrophagy. The second network consists of two of the interferon gamma stimulated genes; IRF9, a transcription factor that plays an essential role in anti-viral activity and MT2A, a metallothionein protein.
The third network consists of two genes (SRSF5 and HNRNPDL) associated with pre-mRNA processing, transport and splicing regulation. The cytokine IL11which regulates hematopoietic cells was part of the fourth network. We found IL11 to be down-regulated. In contrast, CXCL5, a gene associated with neutrophil activation and also present in network 4, was up-regulated following virus infection. The remaining four (LASP1, HDAC7, SLC44A2 HSPG2) out of 6 down-regulated genes and eight (ID2,   HMGN3, TMEM190, TSC22D1, PRKAR2B, LOC100847759, LOC100847143, LOC100174924) out of 19 upregulated genes include signaling, transcriptional regulator, developmental genes.

Discussion
High-throughput long-read sequencing approaches are able to read full-length transcripts, and therefore allow a more comprehensive annotation of RNA molecules. LRS-based studies led to the discovery that the transcriptomes are much more complex than previously thought.
In this study, we annotated a large number of bovine transcripts and analyzed the effect of viral infection upon host gene expression. We found no signi cant change in the usage of promoters or PASs of the host genes. However, we observed an altered usage of transcript length and splice isoforms of the host RNA molecules. This indicates a modulation of cellular mRNA turnover. The analysis of TSS isoforms suggests that viral infection may have an effect on host mRNA translation, potentially through uORFs (Kronstad et al., 2013), or through other cis-acting elements, such as miRNA binding sites of 5'-UTRs. However, downstream TSSs can also result in truncated in-frame ORFs, which might code for N-terminally truncated polypeptides (Crofts et al., 1998;Tombácz et al., 2019). Unlike in HSV-1-infected cells (Rutkowski et al., 2015), we found no increase in the extent of transcriptional readthroughs in BoHV-1infected-cells.
Based on the alteration of expression kinetics, we detected six distinct gene clusters that had signi cantly changed expression during the course of virus infection. Based on the overrepresentation analysis of these clusters we distinguished three functional groups. Genes involved in basic cell functions, including morphogenesis, cell cycle regulation, signaling, catabolic pathways and aerobe respiration, are generally downregulated during viral infection. On the other hand, we observed a considerable upregulation of genes involved in antiviral response. Additionally, genes playing a role in transcription, RNA decay, translation and protein folding were also upregulated. Our analysis shows that most of these genes are associated to distinct molecular functions and biological processes indicating general response to virus infection. However, the rest of the unassociated genes could also be associated with either susceptibility to or defense against viral infection. We also identi ed a small set of immediate response genes that exhibited signi cantly altered expression 1 hour after viral infection.
Altogether, our data provides valuable resources for future functional studies and for understanding how the virus can overcome host defense mechanisms. Furthermore, these results may be helpful for the development of novel antiviral therapies.

Declarations
Funding This study was supported by OTKA K 128247 granted to ZB, by the OTKA FK 128252 and by the Lendület (Momentum) I Program of the Hungarian Academy of Sciences (LP-2020/8) granted to DT. The funding body had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Data availability statement
The sequencing datasets generated during this study are available at the European Nucleotide Archive's SRA database under the accession PRJEB33511 (https://www.ebi.ac.uk/ena/browser/view/PRJEB33511).

Con ict of interest
The authors declare no con ict of interest.