Discovery and Characterization of a Hidden Retroviral Enhancer 1 by Viral DNA-capture-seq Approach

30 Human T-cell leukemia virus type 1 (HTLV-1) is a retrovirus that causes a cancer of 31 infected cells called adult T-cell leukemia (ATL). There is both sense and antisense 32 transcription from the integrated provirus. Sense transcription tends to be suppressed, but 33 antisense transcription is constitutively active in vivo even in proviruses lacking the 5’ long 34 terminal repeat (LTR), a known viral enhancer and promoter. Various efforts have been 35 made to elucidate the regulatory mechanism of HTLV-1 provirus for several decades; 36 however, it remains unknown how HTLV-1 antisense transcription is maintained. Here, using 37 proviral DNA-capture followed by high-throughput sequencing, we found a previously 38 unidentified viral enhancer not in the LTR but in the middle of the HTLV-1 provirus. The 39 host transcription factors, SRF and ELK-1, bind to this enhancer region both in cell lines and 40 in freshly isolated ATL cells. HTLV-1 containing mutations in the SRF- and ELK-1-binding 41 sites markedly decreased chromatin openness at the viral enhancer, viral gene transcription, 42 and enhancing effects on host gene transcription near the viral integration site. Aberrant host 43 genome transcription was observed at nearby integration sites in defective proviruses 44 containing the enhancer in ATL cells. This finding reveals how the exogenous retrovirus 45 achieves persistent infection in the host via the internal viral enhancer and resolves certain 46 long-standing questions concerning HTLV-1 infection. We anticipate that the DNA-capture- 47 seq approach can be applied to analyze regulatory mechanisms of other oncogenic viruses 48 integrated into the host cellular genome. 49 In this study, we screened transcriptional regulatory regions within the HTLV-1 provirus to identify nucleosome-free regions (NFRs), using a highly sensitive micrococcal nuclease sequencing (MNase-seq) approach, following our recently developed HTLV-1 DNA-capture-seq protocol 14,15 . The results reveal an internal HTLV-1 enhancer, which has not been identified for 40 years since Poiesz et al identified HTLV-1 in 1980 2 to the As HTLV-1 contains CTCF-binding sites and therefore viral integration generates an ectopic CTCF-binding site in the host genome, which induces deregulation of host gene transcription via chromatin looping 27 . We demonstrate here that HTLV-1 generates an ectopic enhancer region together with CTCF-binding site in the host genome. These findings indicate that HTLV-1 induces a distinct type of alteration of the host transcriptome via chromatin looping, and thereby upregulates cancer-related genes near ISs and might contributes to the selection of a specific infected cell for clonal expansion during the early phase of leukemogenesis.

). These findings demonstrate that the NFR in the HTLV-1 pX region harbors 149 several fundamental features of an enhancer region.   image of the signals around the NFR. NET-CAGE signals were visualized by IGV. Luc, 166 luciferase; NFR, nucleosome-free region; N.S., not significant.

168
The host transcription factors SRF and ELK-1 bind to the intragenic HTLV-1 169 enhancer 170 The NFR region we identified in this study is ~160 bp in length. We performed 171 transcription factor binding prediction with the NFR sequence based on the consensus 172 binding motif of various transcription factors and found several candidates (Figure 3a). We 173 analyzed their binding to the NFR using highly sensitive ChIP-seq analysis with an HTLV-1 174 DNA-capture approach 14 . The results demonstrated that SRF and ELK-1 co-localized to the 175 NFR of the HTLV-1 proviral DNA (Figure 3b). Since SRF is involved in the regulation of 176 the 5ʹLTR 24 , we also observed the SRF signal in the 5ʹLTR region in TBX-4B cells, in which 177 tax expression is active. Most importantly, the binding of SRF and ELK-1 to the NFR was 178 observed in PBMCs freshly isolated from HTLV-1-infected individuals, indicating that this 179 molecular mechanism is actually ongoing in vivo in infected individuals. 180 Next, we performed electrophoretic mobility shift assays (EMSA) to investigate whether 181 SRF and ELK-1 binding to the NFR depends on DNA sequence. We generated 182 oligonucleotide probes for the NFR with a wild-type (WT) sequence (NFR-wt) and negative 183 control probes targeting viral regions other than the NFR (Figure 3c). We observed a band f Transcriptional regulatory function of the wt (black) or mut (pattern) NFRs was analyzed 218 using the HBZ promoter (yellow) in Jurkat cells by luciferase assay. Luciferase activity was 219 normalized to Renilla activity. Representative data of three independent experiments is 220 shown as fold change to pGL4-basic-HBZ promoter (Student's t-test, *P < 0.05). ATL, adult 221 T-cell leukemia/lymphoma; NFR, nucleosome-free region; wt, wild-type; mut, mutant; N.E., 222 nuclear extract.

224
The SRF and ELK-1 plays a critical role in HTLV-1 enhancer function 225 Next, we investigated the functional role of SRF/ELK-1 binding to the NFR in the 226 context of the whole viral sequence. As the NFR is located in the coding region of the tax 227 gene, we generated mutations of the SRF/ELK-1 binding site without altering the amino acid 228 sequence of the Tax protein. The nucleotides substitutions could change stability of mRNA 229 and translational efficiency, but we confirmed that introduction of mut1, mut2, or mut3 did 230 not change Tax protein levels ( Figure S1a). We constructed HTLV-1 mutant molecular 231 clones (HTLV-1-mut) containing the same mutations as mut3 ( Figure 3d Table S1. 312 We then performed RNA-seq analysis using these clones and found read-through transcripts 313 around the IS of the JET wt-HTLV-1-infected clone ( Figure 5b) but not in the mutant 314 infected clones (Figure 5c). We further tested whether an ectopic enhancer inserted by the 315 HTLV-1 would alter host gene expression near ISs. The proportion of upregulated genes in 316 JET clones infected with HTLV-1-wt was significantly higher than those in mutant HTLV-1 317 clones (P < 0.01; Figure 5d). It has been reported that viral CTCF plays a role in chromatin 318 looping with the host CTCF-binding site and induces changes in host gene transcription 27 . 319 Thus, we also analyzed CTCF binding to the host gene near ISs and found a high frequency 320 of CTCF-binding sites in upregulated host genes ( Figure 5d). We then used CRISPR/Cas9 to 321 introduce the mutation that abrogated SRF-ELK-1 binding to the enhancer region (Figure 3d  single-cell resolution, we performed single-cell RNA-seq analysis using PBMCs from five 373 ATL cases including the same ATL case as in Figure 6b and 6c, and in ATL cases containing defective proviruses. Based on the T-cell receptor (TCR) clonotype and transcriptome data, 375 we performed clustering analysis and found that the ATL clones, which were identified by 376 the T-cell receptor (TCR) clonotype, clustered differently from the other CD4 + T cell clones 377 (Figure 6e and 6f). We then compared the transcriptome near viral IS of CD4 + T cells among 378 five ATL cases. There was remarkable upregulation of the local transcriptome only in the 379 sample with viral integration (Figure 6g and 6h, Figure S4a-4c, left panels). Furthermore, 380 there was a significant increase of the local transcriptome in the ATL clone but not in non-381 ATL CD4 + T cell clones (Figure 6g and 6h, Figure S4a

406
The size of the HTLV-1 genome is just over 9,000 bp. To achieve persistent infection in 407 the host, HTLV-1 encodes several viral genes by alternative splicing in its small genome. In In conclusion, we have analyzed HTLV-1 provirus integrated in the host genome with high 478 resolution and efficiency using HTLV-1-DNA-capture sequencing approach and discovered 479 internal viral enhancer in HTLV-1 genome. This finding provides clues to help solve several 480 long-lasting questions related with HTLV-1 persistence and pathogenesis. Viral DNA-includes the luciferase reporter gene. The NFR was inserted into pGL4-3ʹLTR300 or pGL4-506 5ʹLTR using BamHI or KpnI restriction sites while the NFR mutant was inserted into pGL4-507 3ʹLTR300 using the BamHI restriction site. Primers associated with each construct and the 508 NFR mutant are listed in Table S2.  Table S3.  531 We estimated the number of infected cells by quantifying the copy number of the tax gene 532 normalized to the copy number of the ALB gene by using digital droplet PCR as previously 533 described but with minor modifications 15 . PVL was calculated as follows, PVL (%) = [(copy 534 number of tax)/(copy number of albumin)/2] × 100. Primer sequences are listed in Table S3 535 536 HTLV-1 DNA-capture seq.  Table S5.  Table S4.  Table S6.      Table S1. Integration site and strand direction of wt-and mut-infected clones The profile of the cells infected with each molecular clone. The integration site and 729 the strand direction of wt-infected clones and mut-infected clones.    Oligonucleotides for qRT-PCR and PVL measurement.

738
The sequence of primers and probes for qRT-PCR and PVL measurement. The sequence of labeled and non-labeled Probes for EMSA are described.  Oligonucleotides for MNase assay.

745
The sequence of primers for MNase assay are described.  Oligonucleotide for guide sequence used in CRISPR/Cas9 system.

750
The sequence of oligonucleotide for constructing guide sequence cloned into 751 pX330 plasmid are described.