We have shown that the NRG1 fusion of MDA-MB-175 is more complex than previously described, being a double fusion PPP6R3-TENM4-NRG1 with multiple alternative transcripts, some including the cytoplasmic tail; and it is the result of complex genomic rearrangements. We also confirmed that similar fusions—coding sequence of another gene splicing into genomic exon 3 of NRG1—are found in breast cancers, supporting the use of this fusion as a model example. The structure of these fusions has implications for clinical identification of NRG1 fusions, for understanding the subcellular location and secretion of NRG1 fusion proteins, and explanations of their oncogenicity.
Identifying NRG1 fusions is challenging
Our search for NRG1 fusions in breast cases, and the complexity of the fusion in MDA-MB-175, illustrate that identifying NRG1 fusions in clinical cases is not straightforward. Although we found 4 examples in 572 cases (0.7%), in rough agreement with the 2/120 found by Kim et al  but a substantially higher prevalence than others [8,9], there might well have been more. None of the four fusions in cancers had been called from RNA sequencing alone by the fusion detection software STAR-fusion, presumably because of insufficient read coverage—indeed, in two of four cases, our RNA sequencing yielded only one or two split fusion reads (Supplementary Table 3). Without corresponding evidence from DNA sequencing these fusions would not have been discovered. Similarly, the DNA rearrangements alone would not have specifically identified these cases. The prediction of the ZNF704-NRG1 fusion was tentative because the genomic rearrangement is complex. A plausible reconstruction (Supplementary Figure 8) is that in addition to the ZNF704-NRG1 junction, there is a tandem duplication of about 57kb of NRG1, encompassing the unused exon 7, with insertion of 24kb of inverted sequence into the duplication junction. Furthermore, splicing into exon 3 of NRG1 is from an undocumented exon in ZNF704, so the fusion would not have been found in the RNA sequencing using software that assumed splicing would be to known splice sites. A further 20 of the first 250 breast cancer cases had breakpoints within NRG1 by DNA sequencing, but without detectable fusion, 13 of which had multiple breakpoints, which would make fusion prediction difficult (Supplementary Table 4). No anomalous splicing into NRG1 was detected in these cases. MDA-MB-175 itself is a case in point: the complex rearrangements of NRG1, with 7 breakpoints called within NRG1 (Fig. 2), would not have led to a confident prediction of a fusion.
The importance of correct interpretation is underlined by the probability that some NRG1 rearrangements—including presumably the out-of-frame fusions—are inactivating events as discussed below. Probably our  and others’  estimates of around 5% of breast cancers having breaks within NRG1 by FISH includes many cases where there is no fusion. In conclusion, (as noted before [6,8-10]) RNA analysis is probably necessary, and combining with DNA sequencing improves sensitivity and specificity, but, even with both, sensitive identification of fusions is challenging.
Structure of the MDA-MB-175 fusion
The fusion partners TENM4 and PPP6R3 have not been seen in NRG1 fusions in tissue samples, but this is not surprising, because there are already upwards of 30 known fusion partners (e.g. ). TENM4, teneurin4, has been identified as a probable driver target of structural variation, notably in breast  and Fig. 3b of ref.  and its relative TENM1/ODZ1 was identified as an oncogene target of the mouse mammary tumour virus (MMTV) . It is a transmembrane protein with a cytoplasmic N-terminus and a large extracellular domain, most of which is lost in the fusion (Fig. 3).
An important feature of the PPPR3-TENM4-NRG1 fusion is that, paralleling wild-type NRG1, we found multiple isoforms, including isoforms with the cytoplasmic tail (Fig. 1). The original cDNA cloned by Schaefer et al  lacked the cytoplasmic domain of NRG1, and it has often been assumed that this was a feature of NRG1 fusions in general. The isoforms we found (Fig. 1) all had the Ig-like domain, and they included both alpha and beta forms (alternative exons 10 and 11). Some had the full transmembrane and cytoplasmic C terminus designated 1a and 2a forms  while others, including the original cDNA of Schaefer et al , terminated in an extended exon 11, designated -β3.
Alternative splicing of other NRG1 fusions
Many fusions have been presented as lacking the C-terminal, cytoplasmic exons, and terminating in the β3, non-transmembrane terminus (genomic exon 11ext). But the multiple splice forms in MDA-MB-175 suggests that these other fusions will also come in multiple isoforms, including forms with the cytoplasmic tail. Their absence from the literature is probably an oversight: partly a legacy of the original reports [4,11]; and partly technical, because short-read sequencing only shows the fusion junction, not downstream splicing patterns, and PCR or single-primer amplification of cDNA has often used primers in the β3 terminus (extended exon 11) or the EGF-like domain (exon 9) (e.g. [5, 9]). Our RNA sequencing shows expression of the cytoplasmic tail exons—clearly in two cases, FAM91A1-NRG1 and ARHGEF39-NRG1 but not conclusively in the other two cases where there were too few reads to be confident—but we cannot tell whether these reads are from fusion transcripts, because there is also expression from the normal transcription start site, genomic exon 2 (possibly from contaminating normal tissue).
Further confusion arises because some NRG1 fusions have been described as derivatives of NRG1 TypeIII-β3, but this is misleading: no fusions involve the transcription start site, genomic exon 7, that defines TypeIII neuregulins/heregulins; and many of the fusions include the Ig domains which are not in TypeIII-β3 .
It has also been assumed that the form of NRG1 secreted into the medium by MDA-MB-175 is encoded by the original cDNA of Schaefer et al  but this may not be correct—it might be a cleaved fragment of a transmembrane isoform (Fig. 3).
Oncogenic Function of NRG1 fusions is paradoxical.
The oncogenic function of NRG1 fusions is paradoxical and remains to be fully explained. The fusions apparently form an autocrine loop, stimulating the co-expressed ERBB-ERBB2/HER2 heterodimer [12,13]. But normal epithelia produce both NRG1 and its receptors [23,33], so why would NRG1 fusions be oncogenic? And NRG1 expression is pro-apoptotic when cDNAs are transfected into cells, including the breast cancer cell line MCF7 .
A possible resolution of this puzzle would be that NRG1 and its ERBB-family receptors are, in normal epithelium, produced by different cells, and/or on different faces of the cell , with co-expression in the same cell prevented by strong controls—perhaps leading to the apoptotic activity of transfected NRG1 .
So why are NRG1 fusions oncogenic? One previous hypothesis was that the cytoplasmic domain of NRG1 is pro-apoptotic and is absent from the PPP6R3-TENM4-NRG1 fusion ; our analysis rules this out.
We suggest two alternative explanations: alteration of expression, or alteration of subcellular localisation. Simplest would be altered regulation of NRG1 expression, by placing it downstream of an unrelated promoter, allowing one cell to express ligand and receptor. This would be consistent with the wide range of fusion partners.
Loss of nuclear signalling?
A more intriguing hypothesis is that the fusion proteins have a different subcellular distribution, and, specifically, that one route of nuclear signalling is lost.
NRG1 encodes many isoforms and proteolytically-cleaved forms, secreted, membrane-bound, cytoplasmic and nuclear [1,35]. Among these, two entirely unrelated forms can signal to the nucleus: the cleaved cytoplasmic tail and the Type1-β3 form which includes the Ig-like and EGF-like domains (Fig. 3). The latter is intracellular because it lacks a transmembrane domain or signal sequence [2, 35], and it has been shown to translocate to the nucleus and alter gene expression [36,37]. Translocation is mediated by sequences around the Ig-like domain [36,38] (Breuleux et al  used a truncated ‘heregulin-alpha’ cDNA that lacked a transmembrane domain).
The PPP6R3-TENM4-NRG1 fusion proteins consist of the intracellular part of TENM4 and its transmembrane domain, joined to a range of essentially intact NRG1 isoforms: the only exons of NRG1 lost are the first two transcription-start exons (Fig. 3). Thus, TENM4 brings a transmembrane domain to the fusion, and the TypeI-β3 forms that would normally be intracellular presumably become extracellullar (Fig. 3). Similarly, as noted by others (e.g. ), several fusion partners bring a transmembrane domain, including two of the commonest, CD74 and SLC3A2. Other fusion partners have a signal sequence, e.g. SDC4 , CLU , ADAM9 , or fuse with loss of the Ig-like domain, e.g. some CD74 fusions, RBPMS, TSHZ2 , again denying β3 forms access to the nucleus. However, this is not a universal feature of the fusions, e.g. the FOXA1 and ROCK1 fusions , so it may not be essential for oncogenicity.
NRG1 can be a tumour suppressor or oncogene
Although NRG1 appears to be oncogenic in some tumours, it is inactivated in carcinomas at least as often as it is activated. NRG1 is silenced by methylation in some breast and other carcinomas [23,39,40]; and seems to be at least one target of distal 8p loss, which is one of the most frequent large-scale events in carcinomas . Many of the rearrangements in NRG1 appear not to fuse the gene, or create a fusion that lacks the EGF-like, receptor-binding domain, or are simply out of frame. Examples include a deletion in a breast cancer that removes the ligand-binding domain  and three further inactivating deletions ; fusions that retain only the 5’ end of NRG1, e.g. two described by Drilon et al ; and 3’ fusions that splice in at the transmembrane domain . Of 16 NRG1 fusions found in TCGA RNA-seq data by Hu et al , only 6 appeared to be in-frame fusions of 3’ NRG1 that included the EGF-like domain: four retained only the 5’ end, and most of the others appeared out-of-frame. Many of the rearrangements of NRG1 that we found in breast cancers did not, or were unlikely to create an activating fusion (Supplementary Table 4), including the two fusions that were out of frame.
This dual role could be because high ERBB3 activity can be achieved in two ways: either NRG1 is inactivated to permit high ERBB3 activity in all cells or at both faces, or to prevent NRG1’s pro-apoptotic activity  (which may be a manifestation of the same control); or NRG1 can form an oncogenic autocrine loop, if control preventing co-expression can be broken. The lack of nuclear signalling by the Ig-like domain might be part of the control mechanism.
Whether or not this is the explanation, because many NRG1 rearrangements seem to be inactivating, the correct identification of activating fusions may require care.