Our study has provided a more accurate and detailed description of the chromosome abnormalities in U937. Our method included SNP array to identify rearranged segments, metaphase FISH to localise these segments and further characterise abnormal chromosomes, and B allele frequencies to distinguish between rearrangements of the two different homologues. B allele frequency data from SNP array analysis also allows a comparison of the number of copies of each homologue of a chromosome pair, and is sometimes useful for estimating copy number.
For most chromosomes, the contribution of each of the two homologues could be differentiated using B allele frequency data. Importantly, when there was more than one abnormality of a chromosome, B allele frequency data often allowed us to determine whether these involved the same or different homologues (as demonstrated in Fig. 4). For example, two different chromosome 16 rearrangements that could have potentially involved the same chromosome 16 were shown to instead involve not only two different chromosomes 16 but also the two different homologues (Fig. 4). Another case in point is the der(10)t(10;11): the portion of chromosome 10 distal to the duplication is comprised of material from the other 10 homologue, i.e. there was gene conversion of this region (see Fig. 4). This duplication appears to have arisen via an unbalanced translocation between the der(10)t(10;11) and one copy of the duplicated normal chromosome 10 presumed to have been formed at triploidisation (“green” homologue), with loss of material at the breakpoints (see Results and Fig. 4),
Centromere capture
We described centromere capture events for the first time, in complex unbalanced karyotypes, where acentric segments from one or more chromosomes were preserved by joining to a centromere from a different chromosome [2, 18], and this concept was also later reported by Garsed et al. [19]. A centromere is necessary for stable inheritance and survival of a chromosome formed by the repair of broken chromosome segments [18]. Neocentromeres are functional centromeres created de novo by chromatin modification, and appear to perform a similar function, i.e. rescue of chromosomes that have no centromere [20]. Marker chromosomes with neocentromeres have been described in various sarcoma subtypes [20, 21].
Telomere capture is a similar concept that has been described in cancers and is well accepted [22, 23]. In this cell line we were able to match a short subtelomeric segment from 6q with a deleted chromosome 7 that had no apparent 7q telomere. On SNP array the subtelomeric segment acted as a proxy for the telomere.
We have previously described four centromere capture events: in two unbalanced translocations in the cell line HEL [18] and in two anachromosomes (chromosomes produced by chromothripsis) in a case of AML [2]. The present study identifies a further example of centromere capture: acentric segments from chromosomes 16 and 20 were identified in an abnormal chromosome, which had a centromere from chromosome 11.
These five examples of centromere capture were identified in highly rearranged genomes which we studied with a focus on identifying ambiguous centromeres. As this approach to chromosome characterisation is uncommon, centromere capture may be a significant feature of complex karyotypes. Centromere capture may provide a mechanism for the rescue of broken or shattered chromosome material, providing a selective advantage to the cancer cell [2]. If it provides a mechanism for preservation of oncogenes after chromothripsis or other chromosome breakage events, it may be much more common than these few cases indicate, since the identity of centromeres is not usually studied [2, 19, 21]. When there are multiple breakage and repair events occurring together, for example during chromothripsis [24], the surviving chromosome segments may simply be those that have joined to a segment containing a centromere. Deleted segments would therefore be those that do not re-join to segments containing a centromere and an appropriate telomere complement [18].
U937 Heritage
U937 was first described in 1976 [3], but the karyotypes of U937 sublines held in different laboratories varied considerably from one another by the time they were karyotyped in 1988 [6]. Shipley et al. [6] analysed G-banded chromosomes of three sublines held at different laboratories, U937-1, U937-2, and U937-3. The t(10;11), del(3q) and der(16)t(4;16) were common to all three sublines and were also present in our specimen, and there were several unresolved markers in each subline.
Several later publications [8-11, 25] refined the karyotype using different combinations of chromosomal CGH and FISH. The abnormal chromosomes described in all of these publications also included a der(1) and a der(5) from a translocation between chromosomes 1 and 5 (described as unbalanced in our study with evidence from microarray data and in another publication using chromosomal CGH [8], but balanced in other studies), a del(2p), a psu dic(3;1) (otherwise described as a dic(1;3) [8] or der(3)t(1;3) ([9-11]), the der(6) with 6p amplification and a der(6)t(2;6). With the exception of the del(2p) these abnormalities were all described in the U937-1 karyotype of Shipley et al. [6, 9], and none were described in U937-2 or U937-3. This suggests that the sublines characterised in these later publications and the present study, sourced from both the ATCC (American Type Culture Collection) and the DSMZ (German Collection of Microorganisms and Cell Cultures) [8-11] were closer to each other and to U937-1 than to U937-2, or to U937-3 which was obtained from the laboratory that established U937 [3, 6].
There were several other abnormalities that were described in some studies only. Although some of these differences can be explained by different approaches to analysis, as described below, the detail of some suggests that they are true differences. For example, Cottier et al. ([8]) described a secondary translocation of the der(6)t(2;6) with chromosome 18; several authors reported a der(6)t(6;12) ([8]) or dic(6;12) ([9]; [11]) which was not present in our subline. The del(1q), a fourth copy of chromosome 7 and a third copy of chromosome 22 (mosaic) were unique to our study. There was some additional mosaicism in our subline. This highlights the continuing evolution of cell line genomes in vitro. As a consequence, sublines held in other laboratories may differ in detail from the one described here.
Lee et al. [9] identified duplication of the 2q31->2q33 region in a der(2)dup(2)(q31q33)t(2;6)(q33;q21) by reverse chromosome painting (characterising abnormal chromosomes by labelling and hybridising them to normal metaphase spreads), a duplication that we also identified in our subline. However they reported a subsequent unbalanced translocation with chromosome 6, a rearrangement not present in our specimen. (Both specimens shared a different 2;6 translocation.)
Refined and redefined abnormalities
Comparing the written and photographed karyotypes of the different publications is challenging, and it is not always clear which differences can be attributed to evolution and which to karyotyping inaccuracy. Like the fable of the Blind Men and the Elephant [26], abnormalities of the genome can be described and understood in different ways depending on the tools and the resolution obtained. The U937 genome has been the subject of several characterisations by G-banding, M-FISH, CGH, and/or SNP array [6, 8-11], and sequencing data are available [14]. Here we highlight some similarities and differences between our and published U937 karyotypes that can be explained by different approaches to analysis.
Descriptions of an abnormal chromosome characterised by different assays can be unrecognisable as the same chromosome. This is illustrated by the following two abnormal chromosomes.
One abnormal chromosome whose description has varied depending on the techniques used was a chromosome that was first described by G-banding as a “del(17p)” by Shipley et al. [6]. We identified a der(20)t(15;20), which had a 20 centromere together with chromosome 15 and 20 material, by M-FISH, M-BAND and 20 centromere FISH. Identifying only the chromosome 15 material using a whole chromosome 15 paint, Lee et al. [9] identified a "der(15)” (i.e. a derivative chromosome with a 15 centromere). The inversion and deletion breakpoints that they gave this chromosome using CGH (comparative genomic hybridisation, a FISH technique using labelled cell line DNA pre-annealed to normal DNA to probe normal chromosomes, to identify copy number changes [7]) data are in good agreement with our SNP array data (see fig. 4), but they did not identify the chromosome 20 content in this abnormal chromosome. Using both chromosome 15 and chromosome 20 paints and a 20 centromere probe, Matteucci et al. [11] identified it as a der(20) (i.e. having a 20 centromere) with elements of chromosomes 15 and 20 but without any detail on breakpoints.
Matteucci et al. [11] identified the chromosome 20 content of the der(11)t(11;16;20) using a chromosome 20 paint, and described it as a der(20). However Lee et al. [9], using a chromosome 16 paint, identified it as a del(16q). Stefford et al. [10] identified both the chromosome 16 and chromosome 20 components of this chromosome with M-FISH, which identifies components of all chromosomes. The combination of SNP array, M-FISH and M-BAND enabled a cohesive and more accurate description of this chromosome. M-BAND data showed that the der(16)t(4;16) had the higher of two chromosome 16 breakpoints (Figs. 1, 4). Using B allele frequency and breakpoint data from SNP array we could distinguish between the two 16p breakpoints on different chromosomes in the U937 genome and ascertain that the corresponding der(4)t(4;16) and der(11)t(11;16;20) were derived from different chromosome 16 homologues, the der(4)t(4;16) being derived from the duplicated homologue. We also showed that the der(11)t(11;16;20) contained an 11 centromere by FISH, based on clues from SNP array data and confirmed by FISH (Fig. 3a).
Using various techniques including M-FISH but not FISH for the 20 centromere, Cottier et al. [8] reported that their DSMZ-derived U937 subline had three normal chromosomes 20, and they did not identify any chromosome matching our subline’s chromosome 20-containing abnormal chromosomes (the der(20)t(15;20) and the der(11)t(11;16;20)) - these might be absent in the DSMZ subline. Shipley et al. [25] described three chromosomes that were positive for an 11 centromere: the normal 11, an isochromosome, i(11) and an E-group chromosome. The “isochromosome” matches the der(11)t(10;11) morphologically, and their E-group chromosome fits the description of our der(11)t(11;16;20), which is positive for ETS1. However, they did not identify ETS1 on this chromosome, nor did they identify it on the der(10)t(10;11), neither of which was known to contain chromosome 11 material at the time [27]. Gene localisation by tritiated in situ hybridisation is relatively insensitive and chromosome identification is difficult (personal observation), so that positive signals on an unexpected chromosome could easily have been missed (the authors discussed this possibility [6]).
Of interest, MacGrogan et al. [1] reported loss of heterozygosity at the 20q12 common deleted region (CDR) (they do not specify whether they used specimen from the ATCC or DSMZ) but three copies of the YAC 834H3 region, leading them to conclude that there had been loss of the CDR followed by reduplication from the other homologue. We cannot find mapping information for 834H3 but suggest either that it is not in the region that was lost, or, less plausibly, that reduplication occurred in the subline they tested but not in ours.
Independent 7q deletion producing no net loss of 7q
Trisomy 7 and/or deletion of 7q has been reported in most other U937 specimens [8, 9, 11], but our specimen alone reported a fourth copy of chromosome 7. Partial loss of 7q occurred twice, independently: once as a del(7q) in the largest clone, which had a der(6)del(6)dup(6), and independently in a different, minor clone that did not have the der(6)del(6)dup(6), by unbalanced translocation of chromosome 7 with the other chromosome 6 homologue (Fig 4, Table 2). The occurrence of 7q deletion twice independently is consistent with 7q deletion conferring a selective advantage to the cell. Deletion of 7q is a recognised recurrent myeloid deletion, but in this cell line loss of 7q from one of four copies of chromosome 7 produced partial loss of heterozygosity but no net deletion from the pseudotriploid background. This apparent paradox may be worth further investigation.
Analysis of complex genome reorganisation
Large high-throughput studies of cancer cell lines are producing publicly available expression, copy number and sequence data, and are a valuable resource for understanding cell line biology [28-30]. Standard sequencing technologies cannot yet analyse regions of highly repetitive DNA [31]. Nor do cytogenomic microarrays give information on chromosome organisation. Metaphase FISH is a single cell analysis tool which can help fill in some of these gaps. More recently, optical mapping [32] and nanopore sequencing [33-37] are making the description of highly complex karyotypes more comprehensive, and these will allow the exploration of chromosome rearrangements with greater resolution, including long read sequencing across centromeres. One advantage of our approach is that the distribution of the homologues can readily be interpreted. It is also more accessible at the present time, and targeted portions of the genome can be examined as needed.
A viable chromosome has two telomeres and at least one centromere. To help build a picture of the abnormal chromosomes, ideally an abnormal chromosome will include two subtelomeric segments identified by SNP array data, which account for the two telomeres. However, as the telomeres [28-30]. Standard sequencing technologies cannot yet analyse regions of highly repetitive DNA [31]. Nor do cytogenomic microarrays give information on chromosome organisation. Metaphase FISH is a single cell analysis tool which can help fill in some of these gaps. Optical mapping [32] and nanopore sequencing [33-37] promise to make the description of highly complex karyotypes more comprehensive, and these will allow the exploration of chromosome rearrangements with greater resolution, including long read sequencing across centromeres. One advantage of our approach is that the distribution of the homologues can readily be interpreted. It is also more accessible at the present time, and targeted portions of the genome can be examined as needed are highly repetitive and are not themselves represented on the array, the subtelomere cannot always be used as a proxy for the telomere. For example we assume the der(2)dup(2) has a telomere, even though the 2q subtelomere has been lost (Fig. 4); and we found the subtelomere of 8p to be duplicated on an apparently normal chromosome 8 in our U937. There were several chromosomes without obvious telomeres (i.e. without two subtelomeric segments), including the del(1), the der(2)dup(2), the del(2p), the der(6)del(6)dup(6), and the der(20)t(15;20). Centromere FISH performed on metaphase chromosomes can identify centromeres. If one of the SNP array segments in a chromosome does not contain a centromere or join to a chromosome segment with a centromere, centromere capture or a neocentromere should be suspected.
In 2013, two comprehensive studies of the complex and widely used HeLa genome [38, 39] were published. In one of these studies, Adey et al. [38] used haplotypes of isolated chromosomes, allele ratios and mate-pair sequencing to distinguish between the different chromosome homologues in the abnormal chromosomes and determine the probable structure of marker chromosomes, although the centromere content of the marker chromosomes was not identified. This haplotype information importantly allowed the authors to conclude that MYC was cis-activated by the inserted HPV18 (human papilloma virus) DNA in this cervical cancer cell line. As in our study, this is an example where distinguishing between alterations on the two alternative homologues can provide information on how the genome changes arose.
The present study is valuable as a demonstration of the analysis of complex rearrangements, and also the evolution and main features of the U937 genome. However, it cannot be a definitive picture of the U937 genome due its continuing evolution, as demonstrated by the variation between different sublines and the examples of mosaicism in this subline. Landry et al. [39] predicted that in future, cell line genomes will be routinely characterised so that changes can be identified and studies of cellular processes can be related to the actual genome rather than the reference genome. Studying how genomes in cell lines, cancers and mouse models of cancer are remodelled, should help us understand the processes of karyotype evolution. We advocate the use of a variety of complementary methods to characterise abnormalities and identify the processes occurring during karyotype evolution.