Passage of human cytomegalovirus (HCMV; species Human betaherpesvirus 5) in cell culture results in mutations ranging from small substitutions, insertions, or deletions to large-scale deletions, duplications, and rearrangements. Some mutations appear to be stochastic, whereas others consistently disrupt specific genes and confer growth advantages in certain cell types or growth conditions [1-3]. Of particular importance are mutations that invariably arise during passage in fibroblasts and alter one or more of three contiguous genes, UL128, UL130, and UL131A . These mutations prevent assembly of a pentameric complex (PC) on the virion surface that is important for entry into epithelial and endothelial cells but dispensable for entry into fibroblasts [4-8].
Many commonly used laboratory strains have large deletions as well as various additional mutations that disrupt PC expression, thereby rendering the virus non-epitheliotropic and non-endotheliotropic . Consequently, research has shifted toward the use of genetically more authentic strains such as the endothelial cell-adapted strain TB40/E . However, the designation “TB40/E” has been applied indiscriminately to heterogeneous stocks propagated from the original TB40/E stock , as well as to a variety of viruses derived from bacterial artificial chromosome (BAC) clones generated from such stocks [11, 12]. The sequences and tropisms of these viruses can differ significantly from each other and from the original TB40/E strain. For example, clone Lisa, a virus that was plaque-purified from a TB40/E stock , has a 1 bp insertion in UL128 causing a frameshift after codon 69 , whereas the widely utilized BAC clone TB40-BAC4 carries a single nucleotide substitution in the intron between UL128 exons 2 and 3 that reduces splicing efficiency, lowers levels of the encoded protein, and impairs entry into epithelial cells . In contrast, a more recently constructed BAC clone, TB40-KL7-SE, has no obvious mutations impacting PC expression and is both endotheliotropic and epitheliotropic .
To begin addressing the diversity of TB40/E stocks and the impact of propagation using different cell types, a TB40/E stock amplified twice in primary human foreskin fibroblasts (HFF; TB40/EF, stock 31519) and entering retinal pigmented epithelial cells (ARPE-19; ATCC® CRL-2302) with poor efficiency  was passaged five times in ARPE-19 cells, generating a stock (TB40/EE, stock SE2) capable of infecting HFF and ARPE-19 cells with similar efficiencies . Despite efficient entry, the amounts of cell-free virus generated by ARPE-19 cells infected with TB40/EE were consistently 100- to 1,000-fold lower than those produced by HFF cells, revealing the existence of epithelial cell-specific post-entry restrictions. TB40/EE also exhibited an increased propensity to form multinucleated syncytia in ARPE-19 cell populations, suggesting an enhancement in the ability to induce cell-cell fusion during infection .
In the current study, we used long-read PacBio sequencing to examine in detail the genetic changes potentially associated with adaptation to epithelial cells. DNA was isolated from TB40/EF and TB40/EE cell-free virions as described previously . HiFi SMRTbell library construction and sequencing were performed at the Genomics Core at Virginia Commonwealth University as described previously (Qaffas et al., submitted). The data were processed using tools from the PacBio SMRT-Link command-line package (https://www.pacb.com/support/software-downloads/) with default settings. Two-modal polymerase reads for TB40/EF (25,124) or TB40/EE (8,364) were indexed using pbindex, XML files of the subread counts were produced using dataset, and 16,920 HiFi reads were generated for TB40/EF and 6,985 HiFi reads for TB40/EE using CCS. Final HCMV genome assemblies were made by reference-guided de novo assembly using LoReTTA v0.1 (https:/bioinformatics.cvr.ac.uk/software/; Qaffas et al., submitted) with default settings and the sequence of strain TB40/E clone Lisa (13; GenBank accession no. KF297339.1) as the reference. HiFi reads were then mapped to the respective final assemblies using minimap v2.17-r941 , and the read alignments were visualized using the Integrative Genomics Viewer . The consensus genome sequences were deposited in GenBank under accession numbers MW439038 (TB40/EF) and MW439039 (TB40/EE), and had median coverage depths of 202 and 43 reads/nucleotide, respectively. Differences between these sequences were identified, and these and other major heterogeneities noted during examination of the read alignments were quantified by counting their occurrence in the reads.
The HCMV genome (236 kbp) consists of unique long (UL) and unique short (US) regions, each of which is flanked by inverted repeats in the arrangement ab-UL-b’a’c’-US-ca (the primes denote the inverted repeats of a, b, and c). Comparison of the TB40/EF and TB40/EE consensus genome sequences identified 12 single nucleotide substitutions and one single nucleotide deletion within noncoding regions in the a, b, or c inverted repeats (Table 1). As all but one of these were replicated in the three a repeats (a at each terminus and a’ internally), both b repeats (b/b’), or both c repeats (c/c’), these loci represent only six unique differences. Although these mutations are in noncoding regions, the significant levels of enrichment suggest that they may provide a selective advantage in ARPE-19 cells. However, the inverted repeats have been reported to be particularly prone to mutation during passage of HCMV, with changes being generally replicated in all copies presumably as the result of recombination .
Comparison of the TB40/EF and TB40/EE consensus genome sequences and examination of the read alignments also identified changes in coding regions (Table 2). Given the low efficiency of epithelial cell entry observed for TB40/EF, mutations disrupting UL128, UL130, or UL131A (encoding the PC subunit proteins UL128, UL130, and UL131A, respectively), were anticipated. Indeed, targeted sequencing of TB40/EF had previously identified a suppressor substitution converting the UL128 stop codon (TGA) to TTA (encoding leucine), thereby extending UL128 by 19 residues . Although the consensus genome sequence of TB40/EF did not reflect this mutation, examination of read frequencies revealed the existence of a subpopulation in which 30% of TB40/EF genomes contain this suppressor mutation. Further examination identified two additional subpopulations: one containing a two bp deletion causing a frameshift in UL128 and resulting in truncation of UL128 (44% of genomes), and one containing a single nucleotide substitution in UL130 resulting in a C207S substitution in UL130 (9% of genomes) (Table 2). There was no evidence for subpopulations with mutations in UL131A. The long length of PacBio reads connected not only the two loci in UL128, which are separated by107 bp, but also the UL130 locus 814 bp beyond; among the connecting reads, those containing one mutation did not contain the others. Thus, consistent with the low epithelial cell entry efficiency of the TB40/EF stock , these findings suggest that cumulatively 83% of TB40/EF genomes contain mutations potentially impacting PC assembly or function. In contrast, and consistent with efficient epithelial cell entry , mutations impacting PC subunits were absent from TB40/EE.
Although it has not been demonstrated that the UL128 suppressor substitution disrupts the function of the PC, indirect evidence indicates that the UL130 C207S substitution is likely to have a negative effect. The crystal structure of the PC indicates the presence of a disulfide bond between C207 and C172 , and the converse mutation, C172W, has been reported to occur during serial fibroblast passage of HCMV strain IgKG-H2 in conjunction with loss of epithelial cell tropism (Qaffas et al., submitted). Moreover, in HCMV strain Towne, a frameshift in UL130 after codon 203 replaces 11 C-terminal residues (including C207) with 26 novel residues, resulting in rapid degradation of the mutant protein and loss of endothelial cell tropism . These findings suggest that the C172-C207 disulfide bond is critical for the function or stability of UL130, and for its essential role in PC formation and epithelial cell entry. Curiously, the three mutations impacting PC subunit genes in TB40/EF are different from the mutations in clone Lisa or TB40-BAC4, and were not detected in reads from TB40/EE?. Thus, within the available TB40/E lineages, at least five distinct mutations targeting PC subunit genes have been identified thus far.
Five other changes impacting coding regions were also identified (Table 2). These included single nonsynonymous substitutions in genes UL26 (resulting in E98K in UL26), UL69 (H492Q in UL69), UL122 (D390H in IE2), and US28 (C320W in US28), and a two nucleotide insertion in gene UL141 introducing a frameshift truncating UL141. Examination of read frequencies revealed that most of these changes were enriched to a marginal or modest level: UL26 (from 53 to 100%), UL69 (from 44 to 52%), UL141 (from 44 to 69%) and US28 (from 88 to 100%). Thus, although these changes may be associated with improved replication in ARPE-19 cells, they may have been the consequence of stochastic effects. Moreover, both variants of UL69 and UL141 have been reported previously in consensus sequences of strain TB40/E, namely a partial TB40/E sequence (GenBank accession number AY446866.1), clone Lisa (KF297339.1), and the BAC clones TB40-BAC4 and TB40-KL7-SE (EF999921.1 and MF871618.1, respectively) [9, 11-13]. In UL69, clone Lisa and TB40-BAC4 encode Q492, whereas TB40-KL7-SE encodes H492. In UL141, the frameshift is absent from clone Lisa but present in the partial TB40/E sequence, TB40-BAC4, and TB40-KL7-SE. Thus, it appears that parental TB40/E stocks contained two variants of both genes, with capture of one or the other allele in the genomes of clone Lisa, TB40-BAC4, and TB40-KL7-SE resulting from cloning. In contrast, the prevalence of the D390H substitution in IE2 increased markedly from 5 to 87% (Table 2), suggesting strong selective pressure favoring this allele during ARPE-19 adaptation. The D390 allele is unique to strain TB40/E and is present in all currently reported TB40/E-derived sequences, whereas the H390 allele is conserved among all other sequenced HCMV strains. This observation is all the more interesting given that gene UL84 has been identified as being essential for replication in vitro in fibroblasts in the presence of the IE2 H390 allele, but non-essential in the presence of the D390 allele .
In summary, whole genome sequencing identified variants impacting IE2 and PC subunits U128 and UL130 as being potentially selected during adaptation of HCMV strain TB40/E for growth in epithelial cells. Enrichment of viral genomes lacking disruptive mutations in UL128 and UL130 is consistent with the detected improvement in efficiency of epithelial cell entry , and, as the PC has been associated with increased cell-cell fusion [22-25], may also explain the reported increase in syncytium formation . It is not known how the D390H polymorphism in IE2 determines the requirement for UL84 during fibroblast replication, or whether this phenomenon also extends to epithelial cells, but selection of genomes encoding the H390 allele suggests that IE2 and, perhaps, its interplay with UL84 provide important functions that are unique to HCMV replication in epithelial cells. Construction and phenotypic characterization of viral mutants containing these genetic changes in isolation are in progress to further elucidate the role of UL84 in the context of IE2 H390 or IE2 D390.