Study population and clinical characteristics
We enrolled 77 de novo cDLBCL cases with matched normal tissues for WES and 10 cDLBCLs for WGS. A subset of the 77 dogs (n=43) were also analyzed for RNA-seq.
Complete patient demographics and clinical presentation are described in Supplementary Data 1. Mixed-breed dogs (n=19; 24.7%), German shepherds (n=9; 11.7%), Rottweilers (n=7; 9.1%) and Golden retrievers (n=4; 5.2%) were the most common breeds. There were 40 (51.9%) females and 37 (48.1%) males. At diagnosis, median age was 7 years (range 3-15 years) and median weight was 30.8 kg (range, 4.5-81.3 kg). Regarding clinical stage, 48 (62.3%) dogs had stage V disease, 28 (36.4%) had stage IV and only one (1.3%) dog had stage III disease. Forty-nine (63.6%) dogs were asymptomatic at presentation (substage a), while 28 (36.4%) showed symptoms (substage b). Overall, 22 (28.6%) dogs had bone marrow infiltration, with a median of 3% neoplastic cells (range, 1-50%). Peripheral blood was infiltrated in 43 (55.8%) dogs, with a median of 4% neoplastic cells (range, 1-74%). At presentation, 33 (42.9%) dogs had an increased level of serum LDH and 23 (29.9%) had received steroids before being diagnosed with lymphoma. Treatment significantly affected both time to progression (TTP) and lymphoma specific survival (LSS) (Fig. 1a, b). Indeed, dogs treated with chemo-immunotherapy (n=45; 58.4%) showed a better outcome compared to dogs receiving chemotherapy only (n=32; 41.6%). In these dogs, shorter LSS was significantly associated with substage and bone marrow infiltration. In dogs receiving chemotherapy only, peripheral blood infiltration correlated with both TTP and LSS, whereas bone marrow infiltration correlated with LSS only (Supplementary Data 2).
Landscape of somatic mutations in cDLBCL
Based on WES, the median sequencing depth of targeted regions was 265 (range 140-394) for tumors and 246 (range 110-625) for normal samples, with a mapping rate of 99%. Collectively, the total number of short somatic variants identified across tumors ranged from 93 to 2,899, with an average of 282. Of these variants, 10.3% to 28.7% were annotated as protein coding variants with an average of 18.4%, including 4.9% insertions and deletions (indels) and 95.1% single nucleotide variants (SNVs) (Fig. 2). Among the latter, 68.5% were missense (range, 47.3%-84.3%) (Supplementary Fig. 1). By SIFT, 1,866 were classified as deleterious and 1,220 as tolerated. The full list of the nucleotide variants is reported in Supplementary Data 3.
cDLBCL is characterized by recurrent mutations in specific protein-coding genes
A total of 2,831 protein coding genes showed a nonsynonymous somatic variant in at least one tumor for a total of 3,769 protein coding variants, and 2,368 genes were mutated only in one sample. More importantly, eight genes (TRAF3, SETD2, POT1, TP53, MYC, FBXW7, DDX3X, and TBL1XR1) were recurrently mutated in at least 15% of the dogs. The top 43 most frequently mutated genes are shown in Fig. 3. Several candidate cancer genes were previously identified as genetic drivers in canine cancers (TRAF3, SETD2, POT1, TP53), but others have never been reported in dogs (TBL1XR1, H3C8, DIAPH2, EHD3).
In concordance with previous studies, TRAF3 was the most frequently mutated gene in our cohort (53%). A total of 62 mutations were identified and 16 dogs carried multiple aberrations. Of these mutations, 41.9% and 29% were frameshift and nonsense variants, respectively, while 25.8% were missense (Supplementary Table 1). Exon 11 (ENSCAFT00000028719.4) was the most affected with a total of 31 mutations (Fig. 4a). The second most frequently mutated gene was SETD2. We identified 29 somatic mutations in 24 dogs (31%). Nonsense variants were the most frequent (37.9%), followed by frameshift and missense variants (31%) (Fig. 4b and Supplementary Table 1). POT1 mutations were identified in 28.6% of cases. Missense mutations represented 54.5% of variants, while nonsense and frameshift mutations were 27.3% and 18.2%, respectively (Fig. 4c and Supplementary Table 1). Interestingly, three dogs carried the same mutation (G653R). Finally, 24 mutations in TP53 were detected in 19 dogs (24.7%) and 19 were missense (79.2%), 3 frameshift (12.5%) and 2 nonsense variants (8.3%) (Fig. 4d and Supplementary Table 1). All but one mutation affected the p53 DNA-binding domain. Five dogs presented two TP53 variants and one of them also carried a frameshift deletion and a nonsense variant on the same allele.
We filtered the protein coding somatic variants for known cancer genes using the COSMIC dataset (v92, 723 genes) and a total of 172 cancer genes with at least one protein coding variant were retrieved (Supplementary Data 4). By comparing the genetic alterations in cDLBCLs with several human DLBCL datasets, 1902 genetic alterations in driver genes were shared, including MYC and TP53 (Supplementary Data 4).
To explore patterns of co-mutation and mutual exclusivity, we examined pairwise overlaps using Fisher’s test and mutual exclusion 9 among the top cancer genes. Significant pairwise interactions (FDR=0.1) are depicted in Fig. 5. Notably, PLEC mutations exclusively co-occurred with SETD2, while MAP3K14 mutations were mutually exclusive with TRAF3.
The tumor mutational burden (TMB) ranged from 0.12 to 2.94 (mean, 0.32) and was significantly higher in tumors presenting SETD2 and TP53 mutations compared to wild type (wt) tumors (p<0.05). To identify the potential contributions of the mutational processes within tumor exomes, we applied a Bayesian treatment of non-negative matrix factorization approach. The analysis revealed that the predominant mutational process in all the tumors was Signature 1A which is the product of cytosine deamination at CpG sites due to ageing (Fig. 6).
To construct a comprehensive view of the common genetic alterations underlying cDLBCLs, we grouped genetic aberrations targeting specific oncogenic signaling pathways, including genes and copy number aberrations (CNAs) affected in >5% of cases. In addition to activation of the NF-kB signaling pathway (80% of the dogs), we identified chromatin remodeling and histone modifications (73%), and TNF signaling pathways (66%). The cell cycle was also frequently altered (64%), including alterations in genes with known roles in the G1/S checkpoint (amplifications of MYC and mutations of FBXW7) and the G2/M checkpoint (mutations of TP53) (Fig. 7).
CNAs in cDLBCL are associated with clinical outcome
Regions of somatic CNA were defined using WES segmentation data. The proportion of the tumor genome showing chromosomal aberrations ranged from 0.1 to 19.9% per genome (mean 6.8%) and the median number of CNAs was 35 (range, 1-487) (Fig. 8).
To reliably detect recurrent changes in tumors, the GISTIC algorithm was used. A total of 78 significant regions were identified including 24 gains mainly involving CFA 5, 6, 9 13, and 17, 27 and 31, as well as 54 losses mainly involving CFA 1, 6, 17, 26, and 38 (Supplementary Data 5). In line with previous observations, the most extensive and highly recurrent CNAs were retrieved in CFA 13 overlapping the MYC locus (44%) and CFA 31 (30%), followed by focal gains in CFA 5 and 27. Losses of CFA 14 were identified in 15 dogs (19%). A total of 17 and 20 CNAs were found significantly associated with TTP and LSS, respectively (FDR<0.05) (Supplementary Data 6). In particular, the gain in the whole CFA13 (Supplementary Fig.2 and Supplementary Data 7) was significantly associated with TTP in dogs treated with chemo-immunotherapy.
The potential associations between CNAs and somatic mutations were investigated. We found that focal losses in CFA8 and large broad gains in CFA 13 were strongly associated with TRAF3, SETD2, and TP53 alterations. Moreover, gains in CFA 31 were only associated with TP53 mutations and losses in CFA14 with POT1 mutations (Fig. 9).
The SNV and CNA profiles for each sample were used to analyze the corresponding clonal architecture and revealed the subclonal structure of cDLBCL. To minimize the effect of CNAs and regions with loss of heterozygosity, we considered SNVs in neutral copy number regions. The number of clusters ranged from one to seven (Supplementary Fig. 3) and the number of deleterious variants increased as the number of subclones increased (σ=0.245, p <0.001).
We compared the number of subclones with the clinical features and found a negative correlation between the number of subclones and TTP in dogs treated with chemotherapy only (σ=0.149, p=0.020).
Recurrently mutated genes associate with clinico-pathological features of cDLBCL
Purebred dogs were more frequently affected by POT1 mutations than mixed breed dogs. Further, dogs with TP53 and SETD2 mutations (19 and 24 dogs, respectively) were significantly older compared to dogs with wt genes (p=0.013 and p=0.001, respectively). In contrast, TRAF3 and PHC3 mutations occurred more frequently in younger dogs. SETD2 mutations significantly affected females (71% of SETD2mut dogs, p= 0.027) and were inversely correlated to bone marrow infiltration (12% in mutated dogs compared to 36% in wt ones, p=0.039). Among the dogs carrying SETD2 mutations, mixed breed dogs were prevalent, and the median weight of mutated dogs was lower than the wt group. Dogs with MYC and DDX3X mutations had a higher percentage of peripheral blood infiltration. Also, DDX3X mutations were significantly associated with bone marrow infiltration and a higher LDH level. In line with these results, all dogs with DDX3X mutations had stage V disease. Supplementary Tables 2 and 3 show the significant associations obtained by comparing mutations to clinico-pathological features.
The potential association of mutations with clinical outcome has never been explored in dogs with DLBCL. We first examined the association of specific mutations with survival in all cDLBCL cases. In dogs treated with chemotherapy only, TP53 mutations were correlated with a shorter TTP and LSS. The median TTP for TP53 mutant and wt dogs was 32 and 98 days, respectively, while the median LSS for TP53 mutant and wt dogs was 60 and 176 days, respectively (Fig. 1c, d and Supplementary Data 2). Similarly, dogs carrying MYC mutation were characterized by a shorter TTP and LSS (Fig. 1e, f and Supplementary Data 2).
In dogs treated with chemo-immunotherapy, POT1 mutations were correlated with a shorter LSS. The median LSS for dogs with mutant and POT1wt was 330 and 547 days, respectively (Fig. 1g and Supplementary Data 2). Likewise, dogs carrying TP53 mutation were associated with a shorter median LSS (Fig. 1h and Supplementary Data 2). Also, mutations in both genes were not associated with any other clinical parameters. Interestingly, the co-occurrence of TP53 and FBXW7 mutations resulted in shorter TTP compared to dogs with only TP53 mutation (Supplementary Table 4).
Finally, TMB was significantly associated with tumor-free interval but not with overall survival in dogs treated with chemo-immunotherapy. Also, the median TTP was significantly shorter in dogs with TMB >0.28 compared to those with a TMB<0.21 (Supplementary Fig. 4a and Supplementary Data 2), and the median LSS was shorter in dogs with TMB>0.28 compared to those with a TMB<0.21 (Supplementary Fig. 4b and Supplementary Data 2).
Aberrations in TRAF3, SETD2, POT1, TP53, MYC and DDX3X define specific transcriptional signatures in cDLBCL
Gene expression profiling is now used to classify human DLBCL subtypes with clinical relevance, whereas in dogs attempts to apply a similar approach have failed. Conversely, signatures defined by T-cell mediated immune response, apoptosis, JAK/STAT signaling, microenvironment, inflammatory response and p53 pathway are enriched in cDLBCL with a worse prognosis. Here, we applied a clustering analysis using these prognostic signatures to separate tumors and identify genetic alterations associated with the two groups. RNA-seq data were available for 43 dogs and using previously published nomenclature 21 tumors were classified as cold DLBCL and 22 as hot DLBCL 7. The outcome was worse in the second group, but not significantly. Also, the two groups shared the vast majority of cancer genes and CNA at statistically indistinguishable frequencies. However, cold cDLBCL had frequent mutations in TRAF3 (75%), conversely dogs belonging to hot DLBCL showed rare mutations in TRAF3 (13%). Gene Set Enriched Analysis (GSEA) indirectly confirmed this result revealing an enrichment of gene sets related to immune and inflammatory response and T-cell activation in TRAF3wt tumors compared to TRAF3mut (Supplementary Data 8).
We also investigated the downstream transcriptomic changes in cDLBCLs characterized by mutations in the most frequent candidate cancer genes. Differential analysis was performed comparing dogs with mutated and wt genes. A total of 867, 136, 92, 51, and 26 genes were differentially expressed in DDX3X, SETD2, TP53, MYC, and POT1 comparisons, respectively (Supplementary Table 5). By GSEA an enrichment of gene signatures involved in the regulation of factors of the translation initiation complex and human Burkitt lymphoma were identified in dogs bearing DDX3X mutations (Supplementary Data 9). Dogs with MYC mutations were characterized by signatures associated with tumor microenvironment and apoptosis (Supplementary Data 10). Finally, the expression levels of the immune signatures characterizing hot cDLBCLs were positively associated with TMB (Wilcoxon rank-sum test, p < 0.05).
Integration of -omics and clinico-pathological features predicts survival in cDLBCL
We developed a multivariate supervised learning approach for defining the association of survival with clinico-pathological variables, genetic features and gene expression data. Among all, the Cox model outperformed the random forest models and was considered for further steps (Supplementary Fig. 5a, b). The features were combined to generate a prediction score and cDLBCLs were divided into two subgroups based on their predicted risk (long and short survival). The best performing scores for LSS (AUROC= 0.95) and TTP (AUROC=0.87) were obtained with the following variables: age, bone marrow infiltration (%), treatment (chemotherapy vs chemo-immunotherapy), TP53 genetic status (mut vs wt), and STAP2 and G3BP2 gene expression as logCPM (Supplementary Fig. 5c, d). By excluding STAP2 and G3BP2 expression data AUROC for LSS and TTP dropped to 0.90 and 0.79, respectively (Supplementary Fig. 5a, b). Furthermore, when TP53 genetic data was removed from the analysis AUROC for LSS and TTP dropped further to 0.83 and 0.74, respectively (Supplementary Fig. 5a, b). To prospectively use these data in clinical practice we have developed an interactive webtool (http://compbiomed.hpc4ai.unito.it/canine-dlbcl) to predict survival and tumor relapse using clinico-pathological data, transcriptomic and genomic features.
CNAs identified by WGS are associated with clinical outcome
We further performed WGS in 10 selected dogs to investigate a possible correlation between genetic profiling and clinical outcome. All dogs were treated with chemo-immunotherapy and were divided into two groups, namely “bad responders'' (n=5) and “good responders'' (n=5).
TTP and LSS were significantly different between the two groups (Supplementary Fig. 6a, b). The median sequencing depth was 20 (range 15-26) for cDLBCLs and 12 (range 7-18) for normal samples. Based on WGS, a median of 35 somatic coding mutations (range: 27–343) was identified in “bad responders'' and 32 (range: 17–47) in good responders. Mutation signatures were analyzed, and three prevalent signatures were uncovered in “bad” and “good” responders (Supplementary Fig. 7a, b).
Although differential exposure analysis revealed no significant difference between the two groups, the signature in “good responders” that mainly contributed to the mutation counts matched COSMIC signature 1A, whereas this was true in only one “bad responder”.
In both groups two other signatures equally contributed to mutation spectra and resembled COSMIC signature 9 that is attributed to polymerase η and is implicated in somatic hypermutation in lymphoid cells, and signature 17, that has been previously described in human B-cell lymphomas. In addition, PCSK5 and OR1I1 were exclusively mutated in “bad responders”, whereas TASOR2, ENSCAFG00000038553 and ENSCAFG00000041925 were only mutated in “good responders” (Supplementary Data 11). CNA analysis identified two losses in CFA8 exclusively in “good responders” and one gain in CFA 13 in “bad responders”.
Confirmation of the negative prognostic value of mutated TP53 in an independent cohort of cDLBCL
We confirmed TP53 mutations by Sanger sequencing and further examined the exons 4-8 in a second group of 56 dogs affected by DLBCL, whose clinico-pathological features are reported in Supplementary Data 12. Fifteen dogs were mutated with 14 different mutations. Indeed, two dogs shared the same mutation. Eleven mutations were novel, while three were already reported in dbSNP albeit with unknown frequency. In three dogs, variants were classified as germline since retrieved in matched normal tissue. Among the somatic mutations, we classified nine missense, one frameshift deletion, one frameshift insertion, and one splice-acceptor variant. All missense mutations were predicted as deleterious by SIFT (Supplementary Data 13). Mutations in TP53 were associated with age and significantly enriched in dogs diagnosed with stage IV disease (p=0.001). The prognostic relevance of TP53 mutations was confirmed (Supplementary Data 14). Indeed, TP53mut dogs had a significantly shorter TTP and LSS compared to TP53wt dogs (Supplementary Fig. 8a, b).