3.1. CNVs in IDCs and LNmets compared to NAT samples
CNV analysis was performed on the study cohort of 23 IDCs, 12 LNmets and 7 NAT samples. This analysis identified 4709 CNV segments in IDCs when compared to NAT, with 65% associated with gains and 35% with losses (Table S1). Similarly, in LNmets, a total of 1725 CNV segments were identified with 58.9% associated with gains and 41.1% losses (Table S2), which were not present in NATs. All CNV segments ranged from more than one kilobase to several megabases in size.
Similarly, in the second cohort of 70 TNBC IDCs and the 7 NAT samples, many more regions were amplified rather than deleted. In total, 15597 CNV segments were identified in IDCs with 61.59% gains and 38.4% losses, which were not present in NATs (Table S3).
3.2. Gene annotation of CNVs in IDCs of the study and second cohorts
In IDCs of the study cohort, the most recurrent amplified regions were in the large areas of 8q (such as 8q24.21, 8q23.3, 8q24.13, 8q23.2-8q23.3, 8q23.3, 8q23.3-8q24.11, 8q24.11, 8q24.11-8q24.12, 8q24.13, 8q24.13-8q24.21, 8q24.22-8q24.23, 8q24.3 and 8q22.3) (Table S4.a), followed by 1q (such as 1q21.2, 1q21.3, 1q21.2- 1q21.3, 1q21.3, 1q23.3, 1q24.2), 10p (10p15.3, 10p15.2-10p15.1), 12p13.1-12p12.3 and 2q33.1. After these regions were overlapped with gene annotation data, 594 genes were identified in 8q, which were frequently amplified in more than 50% of the 23 samples (Table S4.b). In Figure 1A, the histogram height showed that the majority of samples shared amplification (red) across the 8q region. Whereas, only 78 genes in 1q, 7 in 2q33.1, 29 in 10p15.1-10p15.3 and 10 in 12p13.1-12p12.3 were frequently amplified in more than 50% of the 23 samples. The most recurrent deleted regions were 14q24.1, followed by 14q21.1, 19p13.3, 4q34.1, 5q13.2 and 5q32 (Table S5.a). Figure 1A shows the higher distribution of deletion (blue) across the q regions of chromosome 4, 5, 14 and 19p compared to the other chromosomal regions. However, deleted regions were shared less among the samples compared to that of the amplified regions. Four genes in 14q24.1, two in 14q21.1, two in 19p13.3, two in 4q34.1, two in 5q13.2 and five in 5q32 were highly recurrent, observed in more than ~40% of the samples (Table S5.b).
To investigate if genes overlapping the amplified and deleted regions in IDCs of the study cohort were associated with specific functional groups and pathways, GO-enrichment and Pathway analysis was used. According to the enrichment scores, “production of molecular mediator of immune response”, “antigen binding”, “immunoglobulin production”, “complement activation” and “regulation of protein activation cascade” were the most enriched GO terms (Table S6.a) and “ribosome biogenesis in eukaryotes” was the most enriched pathway from the list of genes contained within the amplified regions (Table S6.b). In concordance with this finding, upregulation of “gene networks related to ribosome biogenesis” has previously been demonstrated in the MDA-MB-231 TNBC cell line (31).The list of genes overlapping the deleted regions, were highly enriched in GO terms such as “neuron differentiation”, “Cardiac ventricle morphogenesis“, “cardiac chamber morphogenesis”, “system development” and “regulation of hormone levels” (Table S6.c). No pathways were significantly enriched in the list of deleted genes.
In the second cohort, the most frequent amplified regions were 3q24, 2p15, 6p22.1, 8q (8q24.3, 8q11.1, 8q23.2-8q23.3), 1q (1q44, 1q42.2), 10p15.3 and 4q28.3 (Figure 2) (Table S7.a). While at the gene level, 47 genes in 8q, 43 in 1q, 15 in 10p15.3, 5 in 4q28.3, 3 in 3q24, 1 in 2p15, 1 in 6p22.1 were amplified in more than 50% of samples. The most frequent deleted regions were 17p13.1 and 3p21.31. Only 8 genes in 17p13.1 and 7 in 3p21.31 were observed to be frequently deleted in more than 39% of the samples (Table S7.b).
Similar to the study cohort, in IDCs of the second cohort, the most significantly overrepresented GO terms in the list of genes which were amplified were “antigen binding”, “immunoglobulin production”, “production of molecular mediator of immune response”, “complement activation”, whereas “ribosome biogenesis in eukaryotes” was a significantly enriched pathway (Table S8.a and S8.b). In the list of deleted genes, “neutrophil mediated cytotoxicity”, “neutrophil mediated killing of symbiont cell”, “neutrophil mediated immunity” and “killing by host of symbiont cells” were the most enriched GO terms (Table S8.c).
Interestingly, 2601 of total 8943 (29%) CNV associated genes in study cohort were detected in the second cohort showing good concordance between the cohorts. Of the 2601 genes, 2535 were amplified and 66 were deleted in both cohorts. 1599 of 2535 (63%) amplified genes were associated with the chromosome 1q, followed by 303 in 8q, 282 in 19(p and q), 114 in 2p, 68 in 5p while rest the fewer genes were distributed across the 3q, 4q, 6p, 7(p and q), 10p, 12p, 17q, 18p and 20q (Table S9.a). Of the 66 deleted genes in both cohorts, 37 were associated with 8p followed by 22 in 5q, 3 in 19p13.3 while the rest were associated with 3p21.31, 4q32.3, 14q13.2 and 17p13.1 (Table S9.b).
The most enriched Go terms were “immunoglobulin production” and “antigen binding” and “ribosome biogenesis in eukaryotes” was the enriched pathway in the list of amplified genes (Table S10.a and S10.b). While in the list of deleted genes, neutrophil mediated cytotoxicity, “cellular extravasation” and “regulation of chemokine biosynthetic process” were the most enriched Go terms (Table S10.c).
3.3. Gene annotations of CNVs in LNmets
The most frequently amplified regions in the LNmets were 4q28.3, 2p (2p15, 2p11.2-2p11.1), 3q24, 1q21.2, 10p (10p15.3, 10p15.2), 12p11.1, 8q (8q11.1, 8q21.13-8q21.2, 8q24.21, 8q23.3), 20p11.22-20p11.21, 21q22.13, 6p22.1 (Table S11.a, Figure 1B). At the gene level, 2p contained the highest number of amplified genes (105 genes) (Table S11.b) and accounted for more than 50% of cases; followed by 19 in 8q and 15 in 20p11.22 - 20p11.21; whereas 3 in 1q21.2, 6 in 10p15.3, 5 in 12p11.1, 2 in 21q22.13, 4 in 3q24, 6 in 4q28.3, 3 in 6p22.1 were frequently observed in more than 50% of samples. The most frequently deleted regions were 1p36.23, 4q21.1, 5q (5q11.2, 5q23.2) in more than 39% of the samples. Similar to IDCs, the number of deleted regions shared among the multiple samples was less than that of the amplified regions. There were 28 genes in 5q, 7 in 4q21.1 and 5 in 1p36.23 deleted in more than 39% of samples (Table S11.c).
The most enriched GO terms were “Production of molecular mediator of immune response”, “Immunoglobulin production”, “antigen binding”, “regulation of protein activation cascade” and “regulation of complement activation” and “ribosome biogenesis in eukaryotes” was the most enriched pathway in the list of amplified genes (Table S12.a, S12.b). The list of deleted genes showed highest enrichment in GO terms such as “flavonoid glucuronidation”, “flavonoid metabolic process”, “cellular glucuronidation” and “glucuronate metabolic process”. Other highly enriched pathways in the list of deleted genes were mainly involved in metabolism such as “Pentose and glucuronate interconversions”, “steroid hormone biosynthesis”, “drug metabolism”, “chemical carcinogenesis” and the “estrogen signaling pathway” (Table S12.c, S12.d).
3.4. Genes within CNV regions associated with the progression from IDC to LN metastasis
We next determined the genes associated with CNVs in the LNmets to identify changes related to the progression of primary TNBC to metastasis in the study cohort. For this, first we identified total CNV regions of each group and the genes associated to that region. Then we compared the amplified and deleted genes amongst the three groups using Venn diagrams. Group 1: Lymph node positive IDC (IDC LN+) (n=10), Group 2: Lymph node negative IDC (IDC LN-) (n=13) and Group 3: Lymph node metastases (LNmets) (n=12). With this comparison, we aimed to identify genes in common with copy number alterations in IDC LN+ and LNmets, that were not present in IDC LN-, that were potentially associated with metastasis (Figure 3A and 3B).
We identified 441 amplified genes located in chromosome 1q, 5p, 6(p and q), 7q, 8(p and q), 17q and 20q that were in common with IDC LN+ and LNmets. Interestingly, 365 of 441 (83%) genes were associated with the q region of chromosome 1, whilst 30 of 441 (7%) genes were associated with chromosome 6 (6p22.1, 6p24.3 - 6p24.2 and 6q21) and 26 of 441(6%) genes in the 17q region (17q23.3 and 17q25.3) (Table 13.a). Two hundred and forty five deleted genes were located on chromosome 5q, 6p, 8p, 12q, 14q, 17q and 19p and were common to both IDC LN+ and LNmets. Here, 146 of 245 (60%) deleted genes were located in 8p, followed by 50 in 5q (20.4%), 32 in 14q (13.06%), with the other regions encompassing less than 10 genes. The CNV-altered genes that are present in both IDCs LN+ and LNmets but not in IDC LN- are potentially involved in metastatic TNBC disease (Table S13.b).
Performing GO-enrichment and Pathway analysis, the list of amplified genes showed highest enrichment in GO terms in pathways associated with “regulation of complement activation”, “protein activation cascade”, “regulation of acute inflammatory response”, “regulation of protein processing and maturation” and “humoral immune response”; and the highest enrichment in pathways including “complement and coagulation cascades” and “oxytocin-signalling pathway” (Table S14.a, S14.b). Whereas, “TRAIL binding” was the most enriched GO term while “estrogen signalling pathway” and “cytokine-cytokine receptor interaction” were significantly associated with the deleted regions. (Table S14.c, S14.d).
3.5. Integration of CNVs with gene expression analyses
CNV data was integrated with previously published gene expression data (GEO Accession: GSE61723) to determine whether the change in mRNA expression was a result of the CNVs (25). However, very few differentially expressed genes were linked to the CNVs in the study cohort.
In the IDCs of study cohort, we identified 33 of 185 (18%) differentially expressed genes in IDC vs NAT that were copy number altered, where 29 significantly upregulated genes in IDC vs NAT were amplified and 4 significantly downregulated genes in IDC vs NAT were deleted (Figure 4A and 4B) (Table S15).
In the LNmets, 18 of 165 (10.9%) differentially expressed genes in LNmet vs NAT showed copy number alterations, where 5 upregulated genes were amplified and 13 downregulated genes were deleted (Figure 4C and 4D) (Table S16).
Our previous study identified 28 TNBC specific genes that were differentially expressed in IDCs vs NAT of study cohort but not in non-TNBC IDCs (25). In the current study, 3 of 28 TNBC specific genes whose expression was upregulated, were amplified in IDCs of the study cohort (ANKRD36BP1, ANP32E, MYBL1), whereas TBC1D9 and TMEM144 whose expression was downregulated, were deleted in the IDCs of the study cohort.
Additionally, we investigated if the 441 genes which were amplified in both IDC LN+ and LNmets and 245 genes which were deleted in both IDC LN+ and LNmets in our current study also showed differential expression in IDC LN+ vs NAT. For this, we compared these genes with the total 104 genes that were differentially expressed in IDC LN+ vs NAT. The total 104 genes is the result from the previous study (25) which was not published. Only three amplified (ASPM, KIF14, LEMD1) genes were upregulated and three (SNORD113-2, SNORD113-3, SNORD113-4) deleted genes were downregulated in IDC LN+ vs NAT (Table 1).
3.6. CNVs associated with prognosis
We evaluated the prognostic value of the LNmets associated genes located in regions of CNV, which showed a corresponding change in mRNA expression in the study cohort. Relapse free survival analysis (RFS) was performed for three of 441 amplified genes and for three of 245 deleted genes which were associated with LNmet and showed a corresponding change in mRNA expression. Of the three amplified genes, high expression of ASPM and KIF14 were significantly associated with worse RFS (Figure 5), while high expression of LEMD1 showed a non-significant trend in increased RFS. No survival information was available for the three (SNORD113-2, SNORD113-3, SNORD113-4) deleted genes.