RNA Binding Proteins Shape Latent KSHV Genomic Structure: Identification of KSHV Episome Tethering Sites on Host Chromosomes


 In previous studies, we have shown that expression of a viral lncRNA, polyadenylated nuclear RNA (PAN RNA) is essential for inducible viral genomic looping and distal gene activation during Kaposi's sarcoma-associated herpesvirus (KSHV) reactivation. Here we identify viral lncRNA binding proteins, and show that an underlying molecular mechanism regulating the KSHV latency-lytic replication switch is via a viral lncRNA-CHD4 (chromodomain helicase DNA binding protein 4) interaction. Proximity biotin labeling, single cell transcriptomics, and siRNA screening along with complementation studies identified that CHD4's enzymatic activity silences viral gene expression by preventing transcription factory formation. Furthermore, Capture Hi-C, Cleavage Under Targets and Release Using Nuclease (CUT&RUN), and proteomics approaches together identify KSHV episome docking sites on host chromosomes and colocalization with a CHD4 protein complex, ChAHP, at epigenetically active genomic regions. PAN RNA binds and competes with CHD4 DNA binding in vitro, and KSHV episomes detached from these host genomic loci sites when reactivation is triggered. Our studies suggest that CHD4 exhibits strong repressor function by preventing inducible genomic looping, and is therefore important for the ability of KSHV to establish and maintain latency in a "poised" state at selected host genomic loci.


Introduction
Kaposi's sarcoma-associated herpesvirus (KSHV), discovered in 1994, is one of eight human herpesviruses (1). Since then, KSHV has been identified as the causative agent of Kaposi's sarcoma (1)(2)(3) and two human lymphoproliferative diseases, primary effusion lymphoma (PEL) and AIDS-related multicentric Castleman's disease (4)(5)(6)(7). Although HAART (highly active antiretroviral therapy) of HIV-patients dramatically reduced KS incidence in Western countries, KSHV associated-malignancies are still a leading cause of cancer deaths in AIDS patients in Sub-Saharan Africa. In cancer cells, KSHV establishes a latent infection in which most of the viral genes are silenced. External stimuli trigger KSHV reactivation from latently infected cells (8,9).
Spatial and temporal organization of the genome play critical roles in gene regulation.
Temporal bending of a promoter to contact active enhancers is an essential step for gene expression (10). Through chromosome conformation capture analyses with deep sequencing (Capture Hi-C; CHi-C), we found that the structure of the KSHV genome shifts to increase genomic looping at K-Rta binding sites during KSHV reactivation (11). The viral long non-coding RNA (Poly Adenylated Nuclear RNA; PAN RNA) is a major K-Rta transcriptional target, and this genomic locus recruits 40-fold more cellular RNA polymerase II (RNA Pol II) molecules than the cellular ACTB gene, which is consistent with an exponentially higher copy number of PAN RNA molecules in reactivating cells (12). Imaging studies also showed that KSHV genomes assembled to form "transcriptional factories", in which a significant fraction of RNA Pol II was recruited to KSHV genomes during reactivation (12).
The high copy number of PAN RNA has been partly explained by its higher RNA stability (13). At least three sequence elements, ENE (expression and nuclear retention element) at the 4 3' region, MRE (Mta responsive element) at the 5' region of PAN RNA, and the structure of the poly (A) tail, are critical for PAN RNA stability (14)(15)(16). PAN RNA also interacts with multiple cellular and viral proteins (15,17) , and functions as a transactivator by sequestering repressors and/or recruiting histone modifying enzymes (18)(19)(20). Expression of PAN RNA is also involved in increasing the efficacy of protein translation by interacting with poly(A)-binding protein C1 (21).
A recent study demonstrated that PAN RNA can be compensated by other viral lncRNA sequences without a significant loss of KSHV replication, which suggests the presence of a sequence-independent function.
The KSHV ORF57 protein, also known as mRNA transcript accumulation factor (Mta), is an immediate-early gene product, which interacts with a number of RNA binding proteins and RNAs (22). ORF57 protein functions affect most aspects regulating RNA-mediated biology, including increasing splicing efficacy and RNA stability, and facilitating mRNA translation (22), and inhibiting the formation of cellular RNA granules (23,24). lncRNAs associate with specific sets of RNA-binding proteins (RBPs) to form functional ribonucleoprotein (RNP) complexes. Some of the lncRNAs localize to particular nuclear bodies; for example, XIST localizes to the Barr bodies (inactive X chromosome), NEAT1 in paraspeckles, MALAT1 in nuclear speckles, TUG1 in polycomb bodies, and SATIII in nuclear stress bodies (25,26). Notably, RNA molecules are known to induce the formation of phase separation through intrinsically disordered regions (IDRs) of neighboring proteins in a contextdependent manner (27)(28)(29). An idea for the functional mode of lncRNAs in forming a nonmembranous 'ribonucleoprotein (RNP) milieu' through recruitment of protein IDRs has recently emerged; this mechanism provides a flexible, inducible, and dynamic molecular platform in 3D nuclear space for miscellaneous components to assemble at sites of transcription. Identification 5 of proteins involved in the regulation of the complex formation is therefore an important step towards understanding the molecular mechanisms of aggregation and their biological functions.
Recent studies identified an RNA-binding domain in the CTCF protein, which regulates formation of topologically associating domains (30,31), suggesting the presence of distinct classes of RNA-binding protein (RBP)-dependent genomic loops, which are likely regulated by nascent RNAs and other RBP partners.
In this study, we combined proximity biotin labeling with a recombinant bacterial artificial chromosome (BAC)-based KSHV manipulation system to probe for PAN RNA-binding proteins during reactivation. Our studies identified chromodomain helicase DNA binding protein 4 (CHD4), a component of the ChAHP complex, as a strong KSHV silencer by preventing induction of genomic looping and RNA pol II aggregate formation during reactivation. Importantly, KSHV episomes anchored on host chromosomes with the ChAHP complex at selective sites. We propose a model in which KSHV maintains latent chromatin in a "poised" state by utilizing the ChAHP complex to prevent enhancer-promoter interactions at epigenetically active sites.

KSHV RNA binding protein functions in static and inducible genomic looping.
Herpesvirus family encodes ORF57 (e.g., ICP27) homologs. An ORF57 knock-out KSHV has been generated and studies showed that ORF57 is essential for KSHV reactivation/replication (32). The study also showed that supplementing ORF57 cDNA in trans did not completely rescue KSHV replication from ORF57-knock-out BAC transfected cells. The results suggested the idea that ORF57 protein, like cellular RNA-binding proteins (30,31,33), may play a role in organizing KSHV genomic structure during the initial stage of de novo infection. Lack of ORF57 protein 6 during initial burst of transcription may make viral genomic structure less responsive to stimuli. Accordingly, we performed CHi-C in order to examine static and dynamic genomic loop formation in iSLK cells stably-transfected with wild type (Wt) or ORF57-Stop recombinant KSHV BAC DNAs. ORF57 protein deletion along with the expression of other viral proteins was confirmed by immunoblotting (Fig. 1a). The CHi-C results showed that ORF57-Stop mutation decreased long-range genomic loops (Fig. 1b right panel). However, latent gene cluster regions (i.e., located around 124-130 kb region) showed very similar frequencies and directionality of topologically associated domains (TADs) with those of wild type (Fig.1b, bottom panels). PAN RNA genomic regions have the highest gene density in latent cells, indicating these regions were most frequently neighboring other KSHV genomic fragments through looping-mediated contacts (Fig 1c), while latent gene cluster regions seem to avoid contacts with other viral genomic regions (Fig. 1c). Consistent with previous studies that demonstrated ORF57 regulates late gene expression, the genomic contacts most significantly affected by the ORF57-Stop were the late gene cluster regions, where genomic contacts with the immediate-early or early gene clusters (regions with active histone marks) were lost in ORF57-Stop cells (Fig. 1c, blue arrows).
Differences in induced genomic loops further confirmed that genomic interactions between late gene clusters and IE/E clusters at 24 h post-reactivation were reduced by ORF57-Stop mutation (Fig 1d bottom panel; intersections show darker blue). However, the relative abundance of genomic loops inside TADs was instead increased in ORF57-Stop viral genomes. Reduced genomic contacts led viral gene transcription decline more quickly (Fig. 1e), resulting in a lower KSHV virion production in the culture supernatant (Fig. 1f). These results indicate that the viral RNA binding protein, ORF57 protein, has a function in shaping and/or maintaining viral genomic structure to optimize robust viral gene expression.

7
ORF57 is required to form nuclear aggregates associated with PAN RNA expression.
KSHV formed nuclear aggregates during reactivation, and we described the structures as transcription factories, because of the colocalization with RNA pol II (12). Transcription factory is an aggregate of RNA, DNA, and protein molecules, and it is flexible structure for gene transcription (34). As ORF57 protein is known to interact with PAN RNA, we asked if ORF57 protein plays a role in the formation of nuclear aggregates with PAN RNA. The sizes and numbers of the nuclear aggregates were quantified after visualizing PAN RNA with RNA-FISH, and cellular and viral proteins with immunofluorescence staining. The results showed that cells harboring ORF57-Stop KSHV exhibited few RNA pol II aggregates while those with ORF57-Wt showed that a significant fraction (>98.6%) of PAN RNA positive cells contained many RNA pol II aggregates. K-Rta protein, but not ORF57 protein, colocalized with RNA pol II in PAN RNA expressing cells, dismissing requirement for ORF57 physical presence at transcriptional factories (Fig. 2a). There was a positive correlation between K-Rta signal intensity and PAN RNA expression in ORF57-Wt cells (Extended Data Fig. 1). However, such correlation was diminished in the absence of the ORF57, and overall PAN RNA signals were significantly reduced in ORF57-Stop cells (Extended Data Fig. 1 (11), which phenotype-copied ORF57-Stop, also suggest that expression of robust viral lncRNA is important for transcription factory formation. To further confirm the 8 significance of PAN RNA expression in distal promoter activation, we performed a reporter assay, using a bicistronic construct that encodes a PAN RNA expression cassette driven by K-Rta with a downstream ORF16 promoter controlling luciferase expression. A series of mutations were also inserted in the reporter to examine the molecular action of the viral lncRNA (Fig. 2c). The results showed that in the presence of wild type PAN RNA, the ORF16 promoter was synergistically activated by K-Rta and ORF57 protein. This synergistic activation was lost by mutation of K-Rta responsive elements in the PAN RNA promoter or by insertion of a poly(A) signal immediately after the PAN RNA transcription initiation site (Fig. 2c). Deletion of the nuclear retention element (ENE) (14) did not affect ORF16 promoter activation significantly, indicating that the action of transcription in cis may play a larger role in distal promoter activation. These synergistic effects required PAN RNA expression, since deletion of the PAN RNA cassette eliminated the ORF16 promoter activation completely, and the ORF57 nuclear localization signal mutant also failed to enhance ORF16 promoter activation ( Fig. 2c right panels). Together, these results confirmed that PAN RNA expression is important for activation, or de-repression of distal promoter(s).

Identification of lncRNA-binding dependent protein-protein interaction by proximity biotin labeling.
To understand the roles of ORF57 protein in distal promoter activation, we applied the proximity labeling technique to profile PAN RNA neighboring protein during KSHV reactivation (35). The 3xFlag-mini-TurboID (mTID) cassette was inserted as an N-terminal fusion of ORF57, a PAN RNA binding protein, thus the fusion protein is expressed from the endogenous promoter during KSHV reactivation (Fig. 3a). Taking advantage of previous detail mapping study, which identified ORF57 binding sites on PAN RNA, termed MRE (Mta Responsive 9 Element)(15), we utilized a PAN MRE mutant virus having a 9-nucleotide mutation in the MRE element (Fig. 3a). KSHV genome-wide qRT-PCR array analysis showed that the PAN RNA MRE mutation impaired viral gene expression globally, with stronger effects being seen again in late gene cluster regions (Fig. 3b, right panel). Protein biotinylation happened only in the presence of both ORF57 protein (reactivated sample) and biotin, which confirmed specificity (Fig. 3c). By comparing the MRE mutant with Wt, we narrowed cellular proteins responsible for increasing PAN RNA transcripts. Proximity biotin labeling identified a total of 129 and 307 proteins from mTID-57 PAN Wt iSLK cells and mTID-57 PAN-MRE iSLK cells, respectively, as specific ORF57-interacting proteins with p values lower than 0.05 (Extended Data Table 1). Among the interacting proteins, 74 proteins were in common between the PAN MRE Wt and PAN MRE mutant, while 55 proteins were found only in the presence of wild type PAN RNA sequence (Fig.   3d, e). Deletion of MRE seemed to unleash ORF57 protein and allows ORF57 to interact more freely with other RNA binding proteins (Fig. 3d). Gene Ontology (GO) analysis indeed showed that the ORF57-interacting proteins are primarily involved in mRNA processing (Fig. 3f).

Chromodomain helicase DNA binding protein 4 (CHD4) is a strong KSHV gene silencer.
Having a list of cellular proteins that may play a role in the establishment of nuclear aggregates, we next performed siRNA-based screening of ORF57-neighboring proteins for functions in gene transcription using KSHV lytic gene expression as readout (Fig. 4a). Among the 55 cellular proteins that were specifically associated with PAN RNA containing a functional MRE, 12 cellular proteins suppressed KSHV gene expression more than 2-fold. Especially, knock-down of the most highly-enriched protein, CHD4, enhanced KSHV reactivation more than 5-fold, and CHD4 was the strongest repressor among 129 proteins (Fig. 4a). Conversely, functional re-introduction of CHD4 by over expression of mouse Chd4 cDNA (i.e., in order to escape from the shRNA, which targets human CHD4) counteracted effects of CHD4 knockdown (Fig. 4b). Notably, over expression of mouse Chd4 almost completely abolished K-Rta mediated KSHV reactivation (Fig.   4b), and inhibited the transcription factory formation (Fig. 4c). Strong silencing effects were helicase-activity dependent, since mutations in the helicase domain, which were identified in CHD4-associated syndrome (36,37), failed to inhibit KSHV reactivation (Fig. 4c, Extended Data

PAN RNA inhibits CHD4 DNA binding. PAN RNA MRE-dependent biotinylation by ORF57
suggested that CHD4 is an RNA-binding protein and that PAN RNA may bring the two proteins into proximity by serving as a scaffold (model shown in Fig. 3a). We have purified these three components (Fig. 4d) and tested the possibility in isolated reactions. The results showed that CHD4 was indeed precipitated with RNAs in the condition, which both DNA binding protein, NF-kB (p65) and Luciferase were not precipitated with RNAs. The CHD4 RNA-binding seemed sequence-independent and also did not require ORF57 protein (Fig. 4e). CHD4 was recruited to the KSHV genome mostly at TAD boundaries with active histone marks (H3K4me3), which includes three lncRNA coding regions (Fig. 4f). In addition, the CHD4 was clearly enriched at LANA binding sites such as terminal repeat region in naturally infected PEL cells (Fig. 4f, Extended Data Fig. 4). Consistent with a previous study that Drosophila melanogaster CHD4 homolog is capable of binding DNA and human CHD4 ATPase activity is stimulated in presence of naked DNA (38), purified CHD4 protein bound to a double-stranded (ds) DNA fragment encoding the PAN RNA sequence (Fig. 4g). Importantly, presence of increased amounts of PAN RNA (ssRNA) clearly antagonized the CHD4 dsDNA binding, which was completely blocked at 1:10 (dsDNA/RNA) molecular ratio (Fig. 4g). serving as an anchor protein for chromosome tethering (39)(40)(41). We noted that there were two types of LANA CUT&RUN signals; one with sharp peaks characteristic of cellular transcriptional factor binding, and the other showing much stronger/higher peaks that often spanned more than 12 sequences (i.e., 0.8 kb/repeat x 35 copies of TR), and those larger peaks clearly overlapped with CHi-C signals, we considered that those host genomic regions are the KSHV episiome tethering sites. To our surprise, these KSHV tethering areas were also major CHD4 recruitment sites (Fig. 5c, left bottom panel). We separated the peaks into two groups based on the peak distance spanned (e.g., >1 kb or default setting for MACS2 analysis) and then performed separate statistical analyses. The pairwise intersection-Jaccard statistic showed that the TR bound form of LANA was clearly colocalized with CHD4 in experimentally-infected iSLK cells as well as BCBL-1 with Jaccard similarity coefficients of 0.69 and 0.42, respectively (Fig. 5d, Extended Data Figure 7). Venn diagram showed that CHD4 had 28 peaks that spanned more than 1 kb, and 27 of these CHD4 expanded-peaks colocalized with LANA peaks in BCBL-1 cells (Fig. 5d). As expected, acetylated histone (H3K27Ac) and RNA pol II had a larger number of such extended peaks along host chromosomes, and LANA binding sites were often colocalized with RNA pol II and active histone marks (e.g., 40 out of 64 sites; Fig. 4f, Fig. 5d). Co-occupancy analyses also showed that CHi-C reads were primarily localized at active genomic regions but not with H3K36me3 or H3K27me3 marks (Extended Data Figure 8). Interestingly, comparison of CHi-C contacts with reactivated samples indicated that reactivation detached KSHV episomes from host chromosomes (i.e., reducing relative contact reads with host chromatin; Fig. 5e). DNA-FISH performed with centromere-specific PNA (peptide nucleic acid) probes in combination with LANA immunostaining confirmed that the multiple LANA dots were closely localized with centromeres in the nucleus, further supporting the CHi-C results (Fig. 5f, Extended Data Figure   9). In addition, ENCODE atlas data obtained from high resolution Hi-C projects conducted with other cell lines indicated that these KSHV episome tethering sites have higher frequencies of genomic loops, indicating the genomic regions possesses characteristics of gene enhancers 13 (Extended Data Fig. 6). Taken together, the results suggested that KSHV episomes are tethered to unique genomic loci (e.g. active genomic regions near heterochromatin), where locally concentrated CHD4 may function to silence active incoming episomes for latency establishment.
LANA interacts with CHD4. We next performed proximity biotin labeling with recombinant mini-mTID-LANA virus to examine local chromatin environment of LANA. With this, we confirmed that KSHV LANA protein indeed localizes in close proximity to CHD4 in latently infected cells, supporting CUT&RUN results (Fig. 5g, Extended Data Table 2). Importantly, the proteomics studies did not identify NuRD complex components, such as MTAs or HDACs. Instead, the study identified components of the ChAHP complex, another CHD4 protein complex, ADNP (Activitydependent neuroprotector homeobox protein) and HP-1g interacting protein with high confidence ( Fig. 5g), suggesting that episome-tethering regions are likely to be regulated by ChAHP but not NuRD complex. Pull-down studies with purified proteins demonstrated that LANA could interact with CHD4 in the absence of other molecules (Fig. 5h).

CHD4 is important for KSHV latency establishment.
The studies above suggest that CHD4 silences KSHV transcription, which consequently supports the establishment of KSHV latency.
We tested this hypothesis by knocking-down CHD4 in 293FT cells (Fig. 6a) and then infected these cells with KSHV to monitor viral gene silencing by collecting total RNA. Purified KSHVr2.19 was used for de novo infection with a MOI of 1. The results showed that KSHV gene expression continued to increase during a 3-day period in two independent CHD4 KD cells, but not in control shScramble cells (Fig. 6b). The results suggested that CHD4 plays an essential role in KSHV latency establishment (Fig. 6b) and maintenance (Fig. 4b) by silencing KSHV lytic gene expression.

Discussion
Recent studies indicated an RNA-binding domain in transcription factor CTCF is essential for the formation of TAD subsets (30,31). The presence of distinct classes of RNA-binding protein (RBP)-dependent genomic loops suggests the partition of nascent RNAs and other RBP partners in chromatin-looping at transcriptionally active gnomic loci. KSHV ORF57 protein is expressed during KSHV lytic replication, and we found that ORF57 expression affects genomic looping of the latent episomes via expression of viral lncRNA. The results suggested that the 3D latent genomic structure is regulated by lytic transcription program mediated by immediate-early proteins through the recruitment of factors or nascent RNAs.
It is known that enhancer RNAs are part of transcriptional factory formation (42)(43)(44), and we propose that CHD4 enzymatic activity contributes to preventing the DNA/RNA aggregates and therefore inhibits recruitment of RNA-binding proteins (30); this action prevents RNA pol II for accessing other genomic regions and inhibits "recycling" of assembled RNA pol II complex ( Fig. 2c). Notably, loss of function mutations in CHD4 are associated with multiple developmental disorders (45)(46)(47)(48)(49)(50). We think that such losses may result in increased sizes of transcription factories and propensity to form active enhancer-promoter interactions, which would prolong gene transcription; it was clearly seen in KSHV-infected CHD4-KD 293FT cells (Fig. 6b).
Our studies agree with a recent report, which demonstrated that conditional knock-out of CHD4 in brain tissue increases enhancer accessibility and cohesin binding to promote gene 15 expression (51). The conditional CHD4 knock-out strengthens domain contacts and looping between promoters and enhancers (51). CHD4 regulation of super-enhancer accessibility in rhabdomyosarcoma has been found in association with cancer vulnerability (52). Importantly, in an epigenetic factor screening, CHD4 was identified as a strong repressor to KSHV replication (53). However, how KSHV episomes select specific CHD4 binding sites from many CHD4 bound genomic regions or whether the selected CHD4 sites are linked to cell differentiation, tissue-specific gene expression and therefore KSHV tissue tropism remain unknown. It would be also interesting to know if KSHV episome-bound "enhancers" become under the control by KSHV proteins in infected cells.
KSHV evolved to maintain a mysterious lncRNA, which is expressed at very high copy number in the presence of ORF57 protein, and deletion of PAN RNA or ORF57 significantly impairs the entire viral gene expression program (11,18,32). How does PAN RNA expression activate the expression of other viral open reading frames? Here, we provided evidence that PAN RNA binds CHD4 and relieves CHD4 from dsDNA binding as a transcription suppressor protein, thereby promoting viral lytic gene expression. Two viral immediate-early proteins RTA and ORF57 synergize the effect by RTA activating the PAN promoter for transcription and ORF57 stabilizing the transcribed PAN RNA. We speculate that very high copies of PAN RNA may overwhelm and sequester away CHD4 from the KSHV genomic enhancer regions. Because PAN RNA expresses over 10 5 copies, and CHD4 localizes near PAN RNA transcription initiation sites with LANA (Fig. 4f), we expect local PAN RNA concentration would easily result in an RNA-DNA ratio in excess of 10:1 (Fig. 4g). Based on frequent genomic looping, non-coding RNA expression, and CHD4 binding (e.g., which frequently targets enhancers), we also propose that the lncRNA encoding regions [Ori-RNA (T1.5), PAN RNA, K12 (T0.7)] act as inducible enhancers for the KSHV ORFs, and KSHV reactivation (e.g., promoter activation) hinges on lncRNA-CHD4 interactions for de-repression. Similar to CHD4, we previously demonstrated that PAN RNA expression also sequesters LANA from unique regions of KSHV genomes (19). Our CUT&RUN studies showed CHD4 and LANA colocalized on both the KSHV and host genomes ( Fig. 4f) and they could physically interact each other (Fig. 5h); thus, we favor the idea that reactivation is triggered by detachment of LANA/CHD4 complex-loaded TR fragments from the KSHV unique region (ORFs-encoded region) by robust lncRNA expression. Gene-density analysis indeed indicated such action, where TR unit rapidly lost neighboring-DNA fragments during reactivation (Fig. 2C left panel). Noteworthy, our KSHV 3D genomic structure modeling suggested that two highly inducible lncRNA encoding regions [PAN RNA, K12 (T0.7)] localizes close proximity to TR region (Campbell et al., submitted). The results combined with this study may suggest that CHD4-LANA complex has an architectural role for KSHV latent genomic structure, and TR region that recruits a significant amount of both CHD4 and LANA (Extended Data Figure 4), is important for structurally and epigenetically. Disruption of such a "backbone" by CHD4 KD would therefore unleash a viral lytic gene transcription program (Fig. 4b, 6b).
Further studies will be needed to prove this.
The significance of CHD4 in maintaining KSHV latency is also supported at the single cell level. The probability of KSHV reactivation (KSHV transcripts) and CHD4 expression appears to be negatively correlated, but such negative correlation was not seen with similarly expressed another NuRD complex component, HDAC2. Proximity labeling with mini-TurboID LANA also did not identify HDACs or MTAs (Fig. 5g). Interestingly, Ostapcuk et al. reported another stable CHD4 containing protein complex, ChAHP, which consists of ADNP (activity-dependent neuroprotective protein), HP-1b/g, and CHD4 (54). We identified ADNP, CHD4, and HP1 binding protein 3 in proximity of LANA (Fig. 5g), suggesting that KSHV episomes may target ChAHP binding sites or TR regions attract ChAHP complex through multiple copies of CTCCC core sequences in a TR unit (54,55). Furthermore, ChAHP is reported to compete with CTCF binding, and therefore counteracts chromatin looping at CTCF binding sites and maintain evolutionarily conserved spatial chromatin organization by preventing new CTCF bindings that emerged through short Interspersed element expansions (55); this suggests that KSHV cleverly find/build a "safe basecamp" in host chromosomes to maintain viral episomes during evolution (Extended Data Fig. 10). A recent study from Dr. Lieberman's group demonstrated that EBV episomes are tethered in the neighborhood of transcriptionally silent neuronal genes (56). Our C-HiC also found KSHV episomes frequently localized junctions of between eu-and hetero-chromatin (Extended Data Fig. 10 -junction of stripes).
In summary, we have discovered that the gene regulatory ChAHP complex is a key regulator of KSHV latency, and identified KSHV episome tethering sites on the host cellular chromosomes. Our studies suggested that CHD4 binding to DNA as a transcription suppressor to prevent genomic looping. However, PAN RNA expression mediated by RTA and ORF57 during lytic infection enhances viral gene expression by alteration of chromatin looping through its binding to CHD4 (Fig. 6c). With strong effects of CHD4 on KSHV replication, it will be important to study how ChAHP complexes are regulated and to determine the relationship with KSHV replication, including association of KSHV-mediated diseases with individuals who unfortunately possesses CHD4 mutations. for 10 min in a tube shaker at 500 rpm to release digested DNA fragments from the insoluble nuclear chromatin. The supernatant was collected after centrifuge (16,000 x g for 5 min at 4°C) 21 and place on magnetic stand. DNA was extracted using NucleoSpin kit (Takara Bio, Kusatsu, Shiga, Japan). Sequencing libraries were then prepared from 3 ng with the Kapa HyperPrep Kit (Roche) according to the manufacturer's standard protocol. Libraries were multiplex sequenced (2 x 150bp, paired-end) on an Illumina HiSeq 4000 sequencing system to yield ~15 million mapped reads per sample. E. coli genomic DNAs from a pAG-MNase incubation was used to normalize data as described previously (63).

Materials
Genomic data analysis. The HiC-pro 2.11.1 pipeline (64) was used to align sequences from the Hi-C experiments against a combined assembly of reference genomes; the human hg19 (GRCh37) and KSHV (NC_009333.1). The reads were filtered to remove duplicates and to keep only reads that mapped to the KSHV genome by using the Python module Pysam 0.14.1. TADs, border strength, insulation scores, and density plots of the KSHV genome were analyzed using TADbit (65) following the developer's manual. The KSHV mapped reads were extracted for only uniquely mapped reads pairs by identifying intersection of each read-end, and the valid reads were provided by removing reads with self-circle, dangling-end, error, extra dangling-end, too short, too large, duplicated, and random breaks. The valid reads were stored as matrices and binned with resolution of 500 bp (2 kb for density plots); the bins with more than 100 counts and at least 75% of cells with no-zero counts were used in the next steps. Iterative Correction and Eigenvector decomposition (ICE) normalization was used to treat the data with iteration of 100.
Insulation score/border strength was calculated using the TADbit algorithm (65) and visualized as a contact map. Gene density was plotted from the 3D model of each sample. Briefly, the Matrix Modeling Potential (MMP) score (66) was used to identify potential matrices, and the chosen matrices of all samples had scored more than 0.91. The parameters of the matrices, included maximal distance associated between two interacting particles, particles attraction, 22 particles repulsion, and contact distance of particles, were optimized using the Monte Carlo optimization method. Sets of models were produced from possible combination of those parameters, and used to build contact maps. The contact maps were then compared with the Hi-C interaction by averages of Spearman correlation coefficients. The models with higher correlation coefficients represented the original data. The model correlations of all samples in this experiment were higher than 0.79.
Valid interaction products called by HiCUP were converted into Juicebox (68) input format (.hic file), which stores the normalized and un-normalized contact matrices as a highly compressed binary file, by using a series a scripts provided by HiCUP (hicup2homer) and HOMER (makeTagDirectory and tagDir2hicFile) (69). Juicebox was utilized to facilitate adjustments of resolution and normalization, intensity scaling, zooming, and addition of annotation tracks.
Furthermore, the peaks called by MACS2 with Q-value of 0.05 with sizes >1,000 bp were analyzed and visualized using Intervene v0.6.4 (73). Intersections of genomic peaks of 4 different protein binding sites were analyzed with default parameters and visualized as a Venn diagram. The similarity of positions of each protein pair binding sites was computed using the Jaccard statistic and illustrated as pie charts. 23 Single cell RNA sequencing data analysis. Single cell data was analysed with the Cellranger v2.1 pipeline (10x genomics). The pipeline included alignment to the hg38 human reference genome and human herpesvirus 8 strain (GQ994935.1) reference genome, t-distributed stochastic neighbor embedding (tSNE), and K-means clustering. Reads count matrix obtained from the pipeline were normalized using log normalization method with "Seurat" R package (74).
To perform correlation analysis the KSHV genes expression was summarized, log transformed formamide) on glass slide on heat plate at 57°C for 10 min and then 37°C for another 30 min.
Utilization of PNA probes allowed us to avoid harsh conditions often used for DNA hybridization, which we found unacceptable noise and it was not readily compatible with subsequent protein staining. Coverslip was washed with warmed 0.1% tween 20 PBS (57°C) in 6 well plates for 3 times and PBS once. Primary antibody was incubated for LANA staining in PBS with 1 mg/mL yeast tRNA. After washing with PBS 3 times, secondly antibody (Alexa 488-anti Rat IgG) was incubated in PBS with yeast tRNA for 1 h at 37°C. After washing three times with PBS, the coverslips were mounted with antifade reagent (Thermo Fisher) and pictures were taken with Keyence microscope as described above. Cells were monitored for GFP expression and RNA was purified using RNeasy Mini Kit (Qiagen, Venlo, Netherlands).

Reporter assays.
Reporter assays were performed as described previously (75). A luciferase reporter system (Promega) was used to quantify ORF16 promoter activity. Briefly, 500 ng of reporter plasmid containing from PAN RNA promoter to in front of translational initiation site of ORF16 was cloned into pGL3 basic vector and used as a reporter. K-Rta responsive elements, insertion of poly(A) sites, and deletion of ENE elements were prepared with mutagenesis. Each reporter construct was transfected along with K-Rta and/or ORF57 expression plasmids with a ratio of 4:1 (reporter 4 and effector 1) in 293 cells. Two days after transfection, the cells were 27 washed with PBS and incubated with 250 µl passive lysis buffer provided by the manufacturer.
At least three independent assays were carried out for each experiment.
Purification of recombinant protein. Spodoptera frugiperda Sf9 cells (Millipore) were maintained in Ex-Cell 420 medium (Sigma), and recombinant baculoviruses were generated with BAB to bac system as previously described (76,77). Transfer plasmid, pFAST-BAC1 vector was modified by inserting Flag tag at N-terminus, and CHD4, ORF57, p65, and Luciferase cDNAs Quantification of viral replication. siRNA targeting selected cellular genes were transfected in iSLK.219 cells for 48h followed by KSHV reactivation by doxycycline (1 µg/ml). After 24h, the RFP fluorescence intensity was quantified using ImageJ software. The RFP signal intensity was normalized relative to that in cells transfected with non-targeting siRNA (NTC).
Construction of mini-Turbo KSHV BAC16. Recombinant KSHV was prepared by following a protocol for En passant mutagenesis with a two-step markerless red recombination technique (79). Briefly, the codon optimized the mini-TurboID coding sequence (Table 1), which also encodes the 3x Flag tag was first cloned into a pBS SK vector (Thermo Fisher, Waltham, MA USA). The pEPkan-S plasmid was used as a source of the kanamycin cassette, which includes the I-SecI restriction enzyme site at the 5'-end of the kanamycin coding region (79). Kanamycin cassette was amplified with primer pairs listed in Table 1, and cloned into the mini-TurboID coding region at a unique restriction enzyme site. The resulting plasmid was used as a template for another round of PCR to prepare a transfer DNA fragment for markerless recombination with BAC16 (80). Recombinant BAC clones with insertion and also deletion of the kanamycin cassette in the BAC16 genome were confirmed by colony PCR with appropriate primer pairs. Recombination junctions and adjacent genomic regions were amplified by PCR and the resulting PCR products were directly sequenced with the same primers to confirm in-frame insertion of the mini-TurboID cassette into the BAC DNA. The resulting recombinant BAC was confirmed by restriction enzyme digestions (HindIII and BglII), if there were any large DNA deletions. Two independent BAC clones were generated for each mini-TurboID tagged recombinant virus as biological replicates, and used one of the clone for protein ID. Entire BAC DNAs were subsequently sequenced. Insertion of PAN RNA MRE mutations was performed by using primer pairs that encode the intended mutations. Primer pairs were used to amplify the kanamycin cassette and recombination, deletion of kanamycin cassette, confirmation of mutations was performed as described above. Similarly, three stop codons were introduced into the first ORF57 exon with BAC recombination along with HA epitopes were inserted as an N-terminal fusion. antibody incubations were performed as described previously (77). 30 Quantification of viral copy number. Two hundred microliter of cell culture supernatant was treated with 12 µg/ml of DNase I for 15 min at room temperature to degrade uncapsidated DNA. This reaction was stopped by the addition of EDTA to 5 mM followed by heating at 70°C for 15 min. Viral genomic DNA was purified using QIAamp DNA Mini Kit according to the manufacturer's protocol, and eluted in 100 µl of buffer AE. Four microliters of the eluate were used for real-time qPCR to determine viral copy number, as described previously (77). algorithm. Peptide identifications were also required to exceed specific database search engine thresholds. Protein identifications were accepted if they could be established at greater than 5.0% probability to achieve an FDR less than 5.0% and contained at least 2 identified peptides.
Protein probabilities were assigned by the Protein Prophet algorithm (82). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. Proteins sharing significant peptide evidence were grouped into clusters.
Pathway analysis. The proteins identified to be interacting with ORF57 were used for Gene ontology analysis. The top gene ontology processes were enriched by the Metascape webbased platform, and the Metascape software was used for gene ontology analysis (83).

Statistical analysis.
Results are shown as mean ± SD from at least three independent experiments. Data were analyzed using unpaired Student's t test, or ANOVA followed by Tukey's HSD test. A value of p<0.05 was considered statistically significant.

Competing financial interests.
The authors have declared that no conflicts of interests exist.