Implications of HBV-Driven Epigenetic Remodeling and Its Targets in Liver Cancer Stem Cells

Background: Elevated levels of STIM1, an endoplasmic reticulum Ca 2+ sensor/buffering protein, appear to be correlated with poor cancer prognosis where microRNAs are also known to be involved. We investigated a possible viral origin of specic microRNAs identied in Huh-7 human liver cancer stem cells with enhanced STIM1 expression. Methods: Computational strategies including phylogenetic analyses were performed on miRNome data obtained from Huh-7 liver cancer stem cells with enhanced STIM1 and/or Orai1 expression originally cultured in the present study. Results: Results revealed two putative regions in HBV genome based on apparent clustering pattern of stem loop sequences of microRNAs, including miR3653. Reciprocal analysis of these regions revealed critical human genes of which their transcripts are among the predicted targets of miR3653 which was increased signicantly by STIM1 enhancement. Conclusion: This study presents a phylogenetic evidence for an HBV-driven epigenetic remodeling to alter gene expression pattern associated with Ca 2+ homeostasis in STIM1-overexpressing liver cancer stem cells for a possible mutual survival outcome. A novel region on HBV-X protein may affect liver carcinogenesis in a genotype-dependent manner. Therefore, detection of the HBV genotype would have a clinical impact on prognosis.

expression and cellular factors associated with HBV pathogenesis [15]. Many host miRNAs appear to be associated with HBV-induced genomic instability and their expression pro le was altered in stable HBVexpression [16].
A possible causative role of STIM1 elevation in different types of cancer makes it an attractive chemotherapeutic target [14,17]. Therefore, this study investigates a possible phylogenetic relationship between HBV and certain miRNAs whose expression patterns were altered by STIM1 and/or Orai1 enhancement in Huh-7 HCC CSCs. Our analyses revealed certain viral genomic segments either as a possible origin of some miRNAs critical for cancer cell survival or edited in order to activate host's epigenetic defense system on behalf of the virus.

Methods
Cell culture Huh-7 HCC cell lines, originally from Dr. Jack Wands Laboratory (Massachusetts General Hospital, Boston, MA), were tested for authenticity via DNA pro ling (Applied Biosystem's Identi er kit, PN 4322288), at DNA Sequencing & Analysis Shared Resource, University of Colorado Cancer Center. The cells' identity was recon rmed by Idexx Bioresearch Company (Germany) before conducting present study. Mycoplasma contamination was monitored regularly in our laboratory by using MycoAlert Mycoplasma Detection Kit (Lonza). All cell culture processes were performed as described earlier [7].
Invasion assay 48 hours after transfection, control and STIM1-OE and Orai1-OE Huh-7 HCC CSCs, were cultivated in 6well plates (Corning, 8 µm-pore diameter, 6x10 4 cell/well) to investigate invasion and migration characteristics. After 24 hours, cells were stained and migrating cells were counted under microscope.
Data analyses PCR analyses for EMT markers mRNA expression levels of target genes (E-cadherin, N-cadherin and Vimentin) were determined using qRT-PCR (LightCycler 1.5, Roche). cDNAs were synthesized (Transcriptor High Fidelity cDNA synthesis kit) by using RNA samples (1 µg) isolated from the cells (HighPure RNA Isolation Kit, Roche). FastStart DNA Master SYBR Green I Kit (Roche) was used to determine mRNA expression levels (Supplementary Table   4). Serial dilutions of standard 18S rRNA cDNA-containing plasmid with known copy number was used in each PCR run to construct linear regression curve to calculate mRNA concentrations speci cally for each study as decribed earlier [7]. miRNome analysis Total RNA was extracted from cell samples using Total RNA Puri cation Kit (Norgen) which also allows collecting small non-coding RNAs including miRNAs, according to the manufacturer's instructions. RNA concentrations in each sample were determined by Nanodrop (Thermo Fischer Scienti c). Isolated total RNA samples were analyzed (Agilent Bioanalyzer 2100) to calculate the "RNA Integrity Number" (RIN) (1 to 10) of which the value close to 10 shows robustness of the RNA content. Therefore, the samples with RIN ≥7were used in library preparation. A stepwise procedure was performed to prepare cDNA libraries from small RNA samples via ligation of 3 'and 5' adapters, excision of the corresponding constructs from the gel, puri cation and cDNA preparation via RT reaction, according to the manufacturer's instructions. Regarding the critical step in gel electrophoresis, the portion between the lower limit of the 160 bp band and the upper limit of the 145 bp band was recovered from the gel to collect the cDNAs of interest as the size of the original mature miRNAs is approximately 22 nt which reaches 147-150 nt with the addition of 3 'and 5' adapters.
Next generation sequencing (Illumina NextSeq 500), was performed (DONE Genetics & Bioinformatics, Istanbul, Turkey) on cDNA libraries prepared from total small RNA samples isolated in our laboratory. In Solexa sequencing method used by Illumina, DNA fragments were rst placed on ow-cells and ampli ed locally to give high accuracy signal at the site of settlement (bridge-ampli cation). As a result, millions of clusters are formed on the ow-cell. Fragment clusters formed on the ow-cell were sequenced to read one base per cycle. In each cycle, DNA fragments were labeled with 4 different dyes and exposed to 4 different dinucleotides by the Sequencing-by-Synthesis method developed by Illumina. In each cycle, the clusters were visualized by uorescence camera following binding of the appropriate dinucleotide.
Images from each cluster were combined and the nucleotide sequences were deduced from the emission characteristics of the respective clusters.
Quality control: E ciency of the sequencing process was assessed by using FASTQC program (www.bioinformatics.babraham.ac.uk). Reading lters and trimming: During the sequencing process, poor quality base readings in the FastQ readout data were excluded from readings to avoid false positive results in subsequent analysis steps. Trimmomatic application (http://www.usadellab.org/cms/ ? Page=trimmomatic) was used for quality lters and trimming procedures. In order to detect miRNA sequences, the identi ed individual sequences were matched with BLAST via miRBase (www.mirbase.org). If the examined sequences match exactly the miRNA sequences in the miRBase, the corresponding sequence was considered as a miRNA candidate.

RNA-Seq data analysis
NextSeq 500 (Illumina) sequencing data were generated in *.bcl format. These les contain the location and IDs of each DNA clusters on the ow cell and the emission characteristics. From these data, *.fastq les were obtained by using Bcl2fastq v.2.1 (Illumina) program. At the same time, fastq data were divided into different samples according to the index data (Demultiplexing).

Phylogenetic analysis of miRNAs
Stem loop regions of the miRNAs, whose expression levels were signi cantly altered in CSCs due to STIM1 and/or Orai1 overexpression compared to their corresponding control Huh-7 cells, were retrieved from MirBase Database (http://www.mirbase.org/search.shtml). A dataset containing the stem loop sequences of 73 miRNAs has been built. In order to compare these sequences with HBV genome, the dataset have been constructed including these miRNAs and HBV genotypes A, B, C, D, E, F, G and H sequences. The sequences were aligned using MAFFT server (https://mafft.cbrc.jp/alignment/server/) and then manually edited by using Aliview software. Then, they were divided into sub datasets containing different DNA regions with the highest degree of sequence alignment and tentatively named as "Region 1" and "Region 2" for HBV. A Neighbor-joining tree (NJ), a Minimum Parsimony (MP), a Bayesian Phylogenetic Tree (BP), Minimum Evolution Tree (ME) and a minimum spanning tree (MST) based on the Kruskal genetic distance plot were drawn for each region using MEGA-7 software for ME, NJ and MP; Mr.
Bayes for the BP and an in-house script in R for MST. Pairwise genetic distances were calculated by MEGA software (version 7.0). The trees were compared and discussed based on the miRseq data obtained from Huh-7 HCC stem cells. Unresolved trees were discarded and miRNAs that changed upon plasmid-only transfection were excluded from further analysis.
The HBx regions aligned with some parts of the stem-loop sequences were analyzed separately due to the presence of numerous nucleotide mismatches. The resolved trees (MST and ME) were performed independently for distinct regions and only miRNAs clustered close to the viral genome in both approaches were evaluated. Due to sequence shortness, clusters emerged by two different mathematical models were used to con rm the similarity between the human miRNAs and the putative regions of HBV genome.
Genomic database analysis HBV Regions 1 and 2 were compared with the HBV genome (https://hbvdb.ibcp.fr/HBVdb/HBVdbGenome) in order to identify with which genetic region in the HBV genome they overlap and further BLAT alignment algorithm was used to identify a possible region within the human genome (hCG38) in which these regions are aligned (http://genome.ucsc.edu/cgi-bin/hgBlat). BLAT analysis was performed instead of BLAST since the sequences were too short for BLAST analysis.
Putative protein structure homology analysis The protein le for HBx was downloaded from SWISS-MODEL (https://swissmodel.expasy.org/repository/uniprot/Q156X2). The putative "Region 1" (amino acids between 10 and 34) and "Region 2" (amino acids between 117 and138) on HBx protein were labeled by using RasMol program for Windows (RasWin). In order to nd protein structures similar to the identi ed HBV regions, homologous structural templates have been searched and validated using the template search tools HHpred [18] and SwissModel [19]. Three dimensional structures have been constructed by using PyMOL (The PyMOL Molecular Graphics System, Version 2.0 Schroedinger, LLC). The QMEAN Z Score and GMQE (Global Model Quality Estimation) have been used to calculate the degree of nativeness of the structural feature observed in the model on a global scale and to estimate the quality of the 3D model, respectively [20].
Statistics PCR and invasion assay data are expressed as mean ± standard error of the mean. ''n'' represents the number of samples. Statistical signi cance between the means of two groups was evaluated using  (Fig. 1).

Phylogenetic analysis of miRNAs
Stem loop sequences of the miRNAs that were signi cantly altered upon STIM1 and/or Orai1enhancement, were used in phylogenetic analyses to quest their viral origin. Multiple alignment analyses revealed two putative HBV segments, "Region 1" and "Region 2" (Fig. 2, Supplementary Table 2), where most of the stem-loop sequences were aligned.
Due to presence of many unmatched nucleotides between these two regions, further phylogenetic analyses were performed separately for each region. Among all phylogenetic analyses, only minimum evolution (ME) and minimum spanning tree (MST) were able to draw resolved trees, whereas Neighbourhood Joining (NJ), Maximum Parsimony (MP) and Bayesian Parsimony (BP) algorithms were not operational. Therefore, stem loop sequences of miRNAs clustered close to HBV genotypes in both trees were analyzed further.
Genomic database analysis of putative HBV regions The identi ed HBV regions were used in BLAT analysis against the human genome (hCG38) (http://genome.ucsc.edu/cgi-bin/hgBlat). Although HBV "Region 1" did not match the human genome, "Region 2" showed several matching regions in a genotype-dependent manner (Supplementary Table 3).

Structural homology analysis
Locations of "Region 1" and "Region 2" on HBx protein were shown in Fig. 5A. For homology analysis, these regions within all different HBV genotypes were used as separate reference sequences to investigate if any homology exists with human proteins. Homologies were primarily identi ed for genotype F (DQ823095). The calculated QMEAN z Score was statistically signi cant (-4<QMEAN z Score<0) and the GMQE (Global Model Quality Estimation) score that is used to estimate the quality of the three-dimensional model was 0.4, considered an average for heterodimeric models constructed by short sequences according to the software algorithm guidelines. These results showed the convenience of the model for our purpose. Among all HBV genotypes tested, homology models were found only for "Region 2" templates. These models yielded two candidate proteins; membrane-associated guanylate kinases p55 subfamily member 7 (MPP7) (PDB code 3lra.1.A), for HBV genotypes A, B, E, G and H (Fig.  5B) and Latent Transforming Growth Factor β Binding Protein 1 (TGF β-LTBP-1) (PDB code 1ksq.1.A), for HBV genotypes C, D and F (Fig. 5C).

E-Cadherin expression changes
In order to test the effects of STIM1-and/or Orai1-enhancement in Huh-7 HCC CSCs, on "Epithelial to Mesenchymal Transition (EMT)" and "Mesenchymal to Epithelial Transition (MET)", transcription levels of epithelial (E-cadherin) and mesenchymal (N-cadherin and vimentin) markers, were determined. Ecadherin expression was signi cantly increased both by STIM1and STIM1 plus Orai1-enhancement (Fig.  6) while no signi cant change was observed in N-cadherin and vimentin (not shown).

Discussion
In order to partially reveal the molecular events behind HCC poor prognosis, STIM1 and/or Orai1 expression were enhanced in vitro in EpCAM-and CD133-expressing Huh-7 CSC subpopulation and the miRNA pro le was evaluated based on extensive NGS miRNome data. miRNAs, whose expressions signi cantly altered upon enhancement of STIM1 and/or Orai1, were further used in phylogenetic analyses to quest their viral origin. Multiple alignment results of the stem loop nucleotide sequences of these small non-coding RNAs revealed two putative HBV segments, as we proposed "Region 1" and "Region 2", where most of these sequences were partially clustered in the HBx-region of HBV genome.
Previously, miR3653 expression levels were shown to be signi cantly lower in HCC cells compared to that of non-cancerous hepatocytes and its overexpression was shown to inhibit growth and metastatic ability of HCC cells [21]. This appears to mimic the dormant state of CSCs. In our study, besides being signi cantly upregulated by enhancement of STIM1 and/or Orai1 in CSCs, miR3653 was the only miRNA whose stem-loop sequence clustered within both "Region 1" and "Region 2". In addition, its predicted targets have essential roles in carcinogenesis (http://www.targetscan.org/cgibin/targetscan/vert_71/targetscan.cgi?mirg=has -miR-3653-3p). Approximately 20-22 nt-segment of viral Region 2 was found within the intronic regions of some of these target genes (Supplementary Table 3). Cellular roles of six signi cant targets were discussed below in terms of HBV virulence and carcinogenesis.

Calpains (CAPNs):
Calpains are family of Ca 2+ -activated nonlysosomal intracellular cysteine proteases that cleave STIM1 [22]. STIM1 and SOCE are linked to angiogenesis and cancer cell survival, both indicating tumor aggressiveness and poor prognosis [23]. Suppression of calpain by upregulated miR3653 in STIM1enhanced HCC CSCs may have an additional survival outcome for the host.
Carboxylesterase 4A (CES4A): Certain parts of "Region 2" of HBV genotype A and F were found within the intronic region of CES4A. Carboxylases encoded by this gene are responsible for transesteri cation of certain endogeneous substrates and detoxi cation of various xenobiotics and drugs (https://www.genecards.org/cgibin/carddisp.pl?gene=CES4A). This gene was shown to be hypermethylated in HCC tissues [24]. Upregulation of miR3653(3p) in STIM1-and/or Orai1-overexpressing CSCs may suggest an additional epigenetic regulation of this enzyme in HCC. Hypermethylation of this gene and its posttranscriptional suppression by miR3653 can make cells vulnerable to xenobiotics. This dual epigenetic control could be a partial removal of the host defense system on behalf of the invading organism.
Centrosomal protein 76 kDa (CEP76): Being critical for many cellular processes, centriole numbers are strictly controlled since increase in their copy number is considered among the hallmarks of cancer [25]. CEP76 appears to suppress centriole ampli cation suggesting that elevated miR3653 may contribute to the dormancy of liver CSCs via CEP76.
Mucin 20 cell surface associated (MUC20): Overexpression of MUC20, which is considered as a prognostic marker in renal cancers (https://www.proteinatlas.org/ENSG00000176945-MUC20), involved in uncontrolled metastasis of colorectal cancer as well [26]. Some mucins also shown to be operational in primary liver lesions and HCC [27]. Integration of a certain segment of HBV genotype C in intronic regions of MUC20 and upregulation of miR3653 targeting MUC20 may make HBV-induced carcinogenesis liver resident, which is quite consistent with the hepatotropic property of the virus. This may explain why HCC does not metastasize extrahepatic tissues until advanced intrahepatic tumor stage IVA [28].
Beta-1,4-N-acetyl-galactosaminyl transferase 3 (B4GALNT3): Certain parts of "Region 2" of HBV genotype D were found within the intronic region of B4GALNT3, which was shown to be overexpressed in colon cancer cells regulating the stemness [29]. Knock-down of B4GALNT3 led to suppression of malignant phenotype [29]. Upregulation of miR3653(3p) in STIM1and/or Orai1-overexpressing CSCs may, therefore, be a host defense response against the malignancy.
This integration appears to be a part of the evolutionarily-triggered survival strategy of human cells hosting HBV.
Mitochondrial Ca 2+ uniporter (MCU): MCU, a transmembrane protein that allows Ca 2+ uptake into mitochondria, and its regulatory components MICU1 and MICU2 are among the targets of miR3653. Upregulation of miR3653 by STIM1-enhancement may account for survival of the CSCs by precluding Ca 2+ -induced apoptosis carried out by mitochondria.
On one hand, overexpression of STIM1 increases Ca 2+ storage capacity of ER, on the orther hand, miR3653 guarantees CSC survival. Consistent with the other miR3653 targets discussed above; downregulation of MCU also appears to be a part of the survival strategy of the host cells.
Upon BLAT analysis of the identi ed HBV regions against human reference genome, Region 2 was found in intronic regions of several human genes while Region 1 was not. Evaluation of 192 DNA sequences from tumor and corresponding non-tumor samples revealed direct repeat region (DR1), the most common fusion breakpoint of HBV genome [30]. Integration of the HBV "Region 2" into human genome, unlike "Region 1", may result from its close proximity to DR1.
Both Region-1 and Region-2 reside within the HBx-encoding region of HBV (https://hbvdb.ibcp.fr/HBVdb/HBVdbGenome). Previously, different parts of HBx protein were shown to behave differently to chemotherapy, indicating the possibility of designated roles for speci c parts of HBx [31]. Based on this information, the putative HBV "Region 2" (amino acids 117 to 138), may have functional impact on HCC development through genetic or epigenetic mechanisms. Moreover, "Region 2", is an overlapping region for HBx (nt. 1374-1835), HBV core promoter (nt. 1611-1647) and Enhancer II (nt. 1685-1742), as well as a binding site for several transcription factors including HNF3, HNF4, TBP, Sp1 [32,33], regulators of viral transcription [34]. This region also seems to be a hotspot for high-risk HCC mutations [32] having clinical impact on chronic HBV and HCC [33]. Therefore, investigation of the regulatory roles of this region both in HBV and human genome could have translational value.
A high sequence homology of "Region 2" in different HBV genotypes with different intronic regions in human genome is a serendipitous nding of this study. Although HBV integration known to occur randomly throughout the entire human genome, it may also show some genotype-dependent pattern [35,36]. Besides well-de ned preferred integration regions [37], viral insertion preference into variable locations in host genome through this speci c HBx region for different HBV genotypes is a novel observation.
In our in silico analysis, the putative "Region 2" was also found in intronic regions of CASC8, SLC32A1 and RPL23AP5, which were previously shown to harbor integrated HBV genome sections in HCC patients [35]. Therefore, extended analysis of the targets identi ed in this study in human samples may reveal a function of Region 2 in HCC prognosis. Integration of different HBx mutants into host genome was reported to be responsible for altered carcinogenesis in different patients [38]. Based on our data, we further suggest that patient speci c prognosis may also depend on this variable genomic integration.
Based on homology analysis, certain parts of two proteins, Mpp7 and TGF β-LTBP, which affect EMT processes, were revealed to share high homology with Region 2. Mpp7 plays role in formation of tight junctions between epithelial and endothelial cells, which are crucial for cancer progression and metastasis [39][40][41], as well as in establishment and maintenance of apicobasal polarity in epithelial tissues. Altered apicobasal polarity, which normally inhibits epithelial to mesenchymal transformation (EMT) to suppress metastasis [42], is one of the initial modi cations observed during carcinogenic transformation [43]. EMT, which is crucial for tumor progression, requires loss of cell to cell contacts and apical-basal polarity of epithelial cells [44,45]. The homology of "Region 2" of some HBV genotypes to Mpp7p suggested a possible involvement of this region in apicobasal polarity.
TGF-β has dual effects on HCC cancer cells by both suppressing or stimulating tumor development [11]. Inhibition of TGF-β signaling reduces the synthesis and release of connective tissue growth factor in tumor environment, which suppresses tumor cell growth, intravasation, and metastatic dissemination [46,47]. Serum levels of TGFβ-LTBP-1 measured in a cohort of Anti-HCV/HBsAg demonstrated a gradual increase in healthy individuals eventually developing liver cirrhosis and HCC [48]. EMT and enhancement of cell migration and invasion along with differential effects of STIM1 on TGFβ-induced EMT, appear to be correlated with elevated Ca 2+ in ux via SOCE as shown in breast cancer cells [49]. Along with increased proliferation rate as observed in our previous study [7], STIM1-and/or Orai1-overexpressed HCC CSCs appeared to be in an interim state favoring epithelial transition (MET). Signi cant elevation in epithelial marker E-cadherin along with upregulated mesenchymal markers (N-Cadherin and vimentin), presenting a partial EMT behavior, suggested that STIM1 and/ Orai1-enhancement leads CSCs into second phase (equilibrium) of 3Es (Elimination, Equilibrium and Escape) of cancer immunoediting process fascilitating a shift from EMT to MET depending on the microenvironment [50]. Signi cantly upregulated SOCE-related cytosolic Ca 2+ levels [7] appeared to be responsible for the enhanced invasion pattern of STIM1and Orai1-enhanced CSCs, as STIM1-enhancement, per se, inhibited invasion drastically.
In silico determination of "Region 2" for being highly homologous to certain regions in TGFβ-LTBP-1 and Mpp7p, and its integration into the human genome suggest that HBV indirectly affects EMT via two different pathways in order to promote liver cancer progression in a genotype-dependent manner. Therefore, identi cation of the HBV genotype in infections may reveal how EMT is regulated in HCC progression. Upregulation of epithelial marker, E-cadherin, upon STIM1 and/or Orai1 enhancement, emphasizes the impact of miR3653 along with Mpp7 and TGFβ-LTBP on EMT process which awaits further investigation of human gene targets where HBV Region 2 is integrated.
The miRNA expression pattern of CSCs upon mimicking poor prognosis in vitro and partial integration of "Region 2" in different intronic regions of human genome in various HBV genotypes along with protein homologies in silico raises the question whether each HBV genotype follows a different epigenetic strategy in carcinogenesis.
miR3653 appears to be a part of the host defense mechanism recognized by HBV as its stem loop sequence remnants are found in viral HBx region. This mechanism may be mutually favorable for both viral replication process and the host until the terminal stage of HCC. STIM1 upregulation favors host cell survival by upregulating miR3653 which suppresses both calpain and MCU, further guaranteeing uninterrupted cycle of protein machinery in the HBV-infected host cell (Fig. 7). Furthermore, having this versatile genetic repertoire, HBx appears to be a "master key" in virulence as well as virus-driven epigenetic regulation of hepatocarcinogenesis.

Conclusion
Together, our phylogenetic analyses suggest that a possible "survival strategy" of HBV-induced liver CSCs might have been developed throughout the evolution as HBV "Region 2" remnants were found in human genome. Lacking similar DNA stretches in antique (possibly 40-million-years-old) HBV might prove this genetic mimicry which disguises foe as friend. The presence of critical mRNAs coding for proapoptotic proteins among the targets of speci c miRNAs found in our study suggest that the virus might have developed a highly sophisticated survival toolbox in HBx region possibly via genome editing acquired throughout the long-lasting interaction with human cells. As University of Queensland virologist Paul Young quoted earlier "The boundaries between organisms are a bit more merged now, a bit more shadowy. We need to break down those boundaries. The more we look, the more we nd overlap.