The genomic landscape of canine diffuse large B-cell lymphoma identies distinct subtypes with clinical and therapeutic implications

Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoid neoplasm in dogs, being characterized by a remarkable degree of clinical heterogeneity not completely explained by molecular data. This poses a major barrier to understanding the disease and response to therapy or when using dogs with DLBCL in clinical trials. We performed an integrated analysis of genome, exome and RNA sequencing in a cohort of 77 canine DLBCL to dene the genetic landscape of this tumor. A wide range of signaling pathways and cellular processes were found in common with human DLBCL, but the frequencies of the most recurrently mutated genes were different overall. We developed a prognostic model integrating clinical, exonic variants and transcriptomic features to predict outcome in dogs with DLBCL. These results comprehensively dene the genetic drivers of canine DLBCL and can be prospectively utilized to identify new therapeutic opportunities.


Introduction
Lymphoma in domestic dogs is considered a representative and highly predictive spontaneous model for human disease. In particular, the complex genetics interplay, the intact immune system, the environmental exposures and the increasing incidence represent powerful elements for translational studies 1 . Among the many lymphoma subtypes, canine diffuse large B-cell lymphoma (cDLBCL) is the most common, accounting for approximately 60-70% of hematological malignancies in this species 2 .
Current survival rates for cDLBCL after chemotherapy or chemo-immunotherapy are usually disappointing and dogs show markedly different clinical courses and treatment responses, demonstrating a heterogeneous clinical behavior and a di culty in anticipating the outcome 3 . Proposed cDLBCL prognostic classi cation systems are based on bone marrow in ltration, substage, mitotic rate and histological features (centroblastic and immunoblastic) without consideration of the mechanisms underlying tumorigenesis 4,5 . Transcriptomics have shed some light on the pathogenesis of cDLBCL, revealing similarities with its human counterpart, but also important differences that should be considered in veterinary and comparative clinical trials. Compared to normal B-cells, cDLBCL present active NF-kB signaling induced by antigen engagement of the B-cell receptor 6 . Additionally, up-regulation of several Toll-like receptors suggests a pathogenesis similar to human activated B-cell (ABC) DLBCL and the activation of immune-related signatures is correlated with an inferior outcome. Indeed, dogs with a shorter overall survival and tumor-free interval show a higher expression of transcripts coding for proteins involved in JAK/STAT signaling, microenvironment, immune system and p53 pathway 7 .
Recent studies have described the mutational spectrum of canine lymphomas and provided a comprehensive catalog of somatic mutations in coding regions. However, whole genome sequencing (WGS) has not yet been used to investigate canine DLBCL. One large study based on whole exome sequencing (WES) investigated canine B-cell lymphomas obtained from three predisposed breeds (Boxer, Golden Retriever and Cocker Spaniel) and found that both TRAF3 and MAP3K14 were frequently mutated 8 . Notably, FBXW7 mutations occurring in a speci c codon recurrently mutated in several human cancers were identi ed. Despite the large number of cases included in that study the mutations described may result breed-speci c, thereby masking the true heterogeneous mutational landscape of canine lymphoma.
Also, since tumors were not appropriately classi ed according to WHO criteria and clinico-pathological features were not reported, the clinical signi cance of these results are unknown.
In human medicine the integration of next generation sequencing technologies in clinical practice holds great promise for personalized medicine but correlations between genotype and phenotype are critical for the interpretation of these analyses. Veterinary oncology has only recently been modelling the same approach but the process has been hampered by several di culties. Firstly, large multi-institutional molecular studies comprising datasets of fully characterized canine tumors often suffer from a lack of funding. Secondly, understanding the link between molecular aberrations and prognosis is challenging since treatment and outcome are strongly in uenced by the owner. Thirdly, even if genetic alterations are de ned, their functional impact and clinical validation are often unknown, thus preventing the identi cation of new therapeutic targets and the prediction of prognosis.
To address these issues and to prospectively inform clinical trials of canine DLBCL, we performed a comprehensive multiomics pro ling of de novo diagnosed cDLBCL with the goal of clarifying the genetic changes within this tumor. Genetic data obtained from WES and WGS were correlated with clinicopathological features. Finally, an integrated model comprising mutations, copy number aberrations (CNAs) and transcriptome was designed to predict overall survival and tumor-free interval. Further, clinically signi cant mutations were validated in an independent cohort of cDLBCL.

Study population and clinical characteristics
We enrolled 77 de novo cDLBCL cases with matched normal tissues for WES and 10 cDLBCLs for WGS. A subset of the 77 dogs (n=43) were also analyzed for RNA-seq.

Landscape of somatic mutations in cDLBCL
Based on WES, the median sequencing depth of targeted regions was 265 (range 140-394) for tumors and 246 (range 110-625) for normal samples, with a mapping rate of 99%. Collectively, the total number of short somatic variants identi ed across tumors ranged from 93 to 2,899, with an average of 282. Of these variants, 10.3% to 28.7% were annotated as protein coding variants with an average of 18.4%, including 4.9% insertions and deletions (indels) and 95.1% single nucleotide variants (SNVs) (Fig. 2). Among the latter, 68.5% were missense (range, 47.3%-84.3%) ( Supplementary Fig. 1). By SIFT, 1,866 were classi ed as deleterious and 1,220 as tolerated. The full list of the nucleotide variants is reported in Supplementary Data 3. cDLBCL is characterized by recurrent mutations in speci c protein-coding genes A total of 2,831 protein coding genes showed a nonsynonymous somatic variant in at least one tumor for a total of 3,769 protein coding variants, and 2,368 genes were mutated only in one sample. More importantly, eight genes (TRAF3, SETD2, POT1, TP53, MYC, FBXW7, DDX3X, and TBL1XR1) were recurrently mutated in at least 15% of the dogs. The top 43 most frequently mutated genes are shown in Fig. 3. Several candidate cancer genes were previously identi ed as genetic drivers in canine cancers (TRAF3, SETD2, POT1, TP53), but others have never been reported in dogs (TBL1XR1, H3C8, DIAPH2, EHD3).
In concordance with previous studies, TRAF3 was the most frequently mutated gene in our cohort (53%). A total of 62 mutations were identi ed and 16 dogs carried multiple aberrations. Of these mutations, 41.9% and 29% were frameshift and nonsense variants, respectively, while 25.8% were missense (Supplementary Table 1). Exon 11 (ENSCAFT00000028719.4) was the most affected with a total of 31 mutations (Fig. 4a). The second most frequently mutated gene was SETD2. We identi ed 29 somatic mutations in 24 dogs (31%). Nonsense variants were the most frequent (37.9%), followed by frameshift and missense variants (31%) (Fig. 4b and  and 19 were missense (79.2%), 3 frameshift (12.5%) and 2 nonsense variants (8.3%) ( Fig. 4d and Supplementary Table 1). All but one mutation affected the p53 DNA-binding domain. Five dogs presented two TP53 variants and one of them also carried a frameshift deletion and a nonsense variant on the same allele.
We ltered the protein coding somatic variants for known cancer genes using the COSMIC dataset (v92, 723 genes) and a total of 172 cancer genes with at least one protein coding variant were retrieved (Supplementary Data 4). By comparing the genetic alterations in cDLBCLs with several human DLBCL datasets, 1902 genetic alterations in driver genes were shared, including MYC and TP53 (Supplementary Data 4).
To explore patterns of co-mutation and mutual exclusivity, we examined pairwise overlaps using Fisher's test and mutual exclusion 9 among the top cancer genes. Signi cant pairwise interactions (FDR=0.1) are depicted in Fig. 5. Notably, PLEC mutations exclusively co-occurred with SETD2, while MAP3K14 mutations were mutually exclusive with TRAF3.
The tumor mutational burden (TMB) ranged from 0.12 to 2.94 (mean, 0.32) and was signi cantly higher in tumors presenting SETD2 and TP53 mutations compared to wild type (wt) tumors (p<0.05). To identify the potential contributions of the mutational processes within tumor exomes, we applied a Bayesian treatment of non-negative matrix factorization approach. The analysis revealed that the predominant mutational process in all the tumors was Signature 1A which is the product of cytosine deamination at CpG sites due to ageing (Fig. 6).
To construct a comprehensive view of the common genetic alterations underlying cDLBCLs, we grouped genetic aberrations targeting speci c oncogenic signaling pathways, including genes and copy number aberrations (CNAs) affected in >5% of cases. In addition to activation of the NF-kB signaling pathway (80% of the dogs), we identi ed chromatin remodeling and histone modi cations (73%), and TNF signaling pathways (66%). The cell cycle was also frequently altered (64%), including alterations in genes with known roles in the G1/S checkpoint (ampli cations of MYC and mutations of FBXW7) and the G2/M checkpoint (mutations of TP53) (Fig. 7).

CNAs in cDLBCL are associated with clinical outcome
Regions of somatic CNA were de ned using WES segmentation data. The proportion of the tumor genome showing chromosomal aberrations ranged from 0.1 to 19.9% per genome (mean 6.8%) and the median number of CNAs was 35 (range, 1-487) (Fig. 8).
To reliably detect recurrent changes in tumors, the GISTIC algorithm was used. A total of 78 signi cant regions were identi ed including 24 gains mainly involving CFA 5, 6, 9 13, and 17, 27 and 31, as well as 54 losses mainly involving CFA 1, 6, 17, 26, and 38 (Supplementary Data 5). In line with previous observations, the most extensive and highly recurrent CNAs were retrieved in CFA 13 overlapping the MYC locus (44%) and CFA 31 (30%), followed by focal gains in CFA 5 and 27. Losses of CFA 14 were identi ed in 15 dogs (19%). A total of 17 and 20 CNAs were found signi cantly associated with TTP and LSS, respectively (FDR<0.05) (Supplementary Data 6). In particular, the gain in the whole CFA13 (Supplementary Fig.2 and Supplementary Data 7) was signi cantly associated with TTP in dogs treated with chemo-immunotherapy.
The potential associations between CNAs and somatic mutations were investigated. We found that focal losses in CFA8 and large broad gains in CFA 13 were strongly associated with TRAF3, SETD2, and TP53 alterations. Moreover, gains in CFA 31 were only associated with TP53 mutations and losses in CFA14 with POT1 mutations (Fig. 9).
The SNV and CNA pro les for each sample were used to analyze the corresponding clonal architecture and revealed the subclonal structure of cDLBCL. To minimize the effect of CNAs and regions with loss of heterozygosity, we considered SNVs in neutral copy number regions. The number of clusters ranged from one to seven ( Supplementary Fig. 3) and the number of deleterious variants increased as the number of subclones increased (σ=0.245, p <0.001).
We compared the number of subclones with the clinical features and found a negative correlation between the number of subclones and TTP in dogs treated with chemotherapy only (σ=0.149, p=0.020).
Recurrently mutated genes associate with clinico-pathological features of cDLBCL Purebred dogs were more frequently affected by POT1 mutations than mixed breed dogs. Further, dogs with TP53 and SETD2 mutations (19 and 24 dogs, respectively) were signi cantly older compared to dogs with wt genes (p=0.013 and p=0.001, respectively). In contrast, TRAF3 and PHC3 mutations occurred more frequently in younger dogs. SETD2 mutations signi cantly affected females (71% of SETD2mut dogs, p= 0.027) and were inversely correlated to bone marrow in ltration (12% in mutated dogs compared to 36% in wt ones, p=0.039). Among the dogs carrying SETD2 mutations, mixed breed dogs were prevalent, and the median weight of mutated dogs was lower than the wt group. Dogs with MYC and DDX3X mutations had a higher percentage of peripheral blood in ltration. Also, DDX3X mutations were signi cantly associated with bone marrow in ltration and a higher LDH level. In line with these results, all dogs with DDX3X mutations had stage V disease. Supplementary Tables 2 and  3 show the signi cant associations obtained by comparing mutations to clinico-pathological features.
The potential association of mutations with clinical outcome has never been explored in dogs with DLBCL. We rst examined the association of speci c mutations with survival in all cDLBCL cases. In dogs treated with chemotherapy only, TP53 mutations were correlated with a shorter TTP and LSS. The median TTP for TP53 mutant and wt dogs was 32 and 98 days, respectively, while the median LSS for TP53 mutant and wt dogs was 60 and 176 days, respectively ( In dogs treated with chemo-immunotherapy, POT1 mutations were correlated with a shorter LSS. The median LSS for dogs with mutant and POT1wt was 330 and 547 days, respectively ( Fig. 1g and Supplementary Data 2). Likewise, dogs carrying TP53 mutation were associated with a shorter median LSS ( Fig. 1h and Supplementary Data 2). Also, mutations in both genes were not associated with any other clinical parameters. Interestingly, the co-occurrence of TP53 and FBXW7 mutations resulted in shorter TTP compared to dogs with only TP53 mutation (Supplementary Table 4).
Finally, TMB was signi cantly associated with tumor-free interval but not with overall survival in dogs treated with chemo-immunotherapy. Also, the median TTP was signi cantly shorter in dogs with TMB >0.28 compared to those with a TMB<0. Aberrations in TRAF3, SETD2, POT1, TP53, MYC and DDX3X de ne speci c transcriptional signatures in cDLBCL Gene expression pro ling is now used to classify human DLBCL subtypes with clinical relevance, whereas in dogs attempts to apply a similar approach have failed. Conversely, signatures de ned by Tcell mediated immune response, apoptosis, JAK/STAT signaling, microenvironment, in ammatory response and p53 pathway are enriched in cDLBCL with a worse prognosis. Here, we applied a clustering analysis using these prognostic signatures to separate tumors and identify genetic alterations associated with the two groups. RNA-seq data were available for 43 dogs and using previously published nomenclature 21 tumors were classi ed as cold DLBCL and 22 as hot DLBCL 7 . The outcome was worse in the second group, but not signi cantly. Also, the two groups shared the vast majority of cancer genes and CNA at statistically indistinguishable frequencies. However, cold cDLBCL had frequent mutations in TRAF3 (75%), conversely dogs belonging to hot DLBCL showed rare mutations in TRAF3 (13%). Gene Set Enriched Analysis (GSEA) indirectly con rmed this result revealing an enrichment of gene sets related to immune and in ammatory response and T-cell activation in TRAF3wt tumors compared to TRAF3mut (Supplementary Data 8).
We also investigated the downstream transcriptomic changes in cDLBCLs characterized by mutations in the most frequent candidate cancer genes. Differential analysis was performed comparing dogs with mutated and wt genes. A total of 867, 136, 92, 51, and 26 genes were differentially expressed in DDX3X, SETD2, TP53, MYC, and POT1 comparisons, respectively (Supplementary Table 5). By GSEA an enrichment of gene signatures involved in the regulation of factors of the translation initiation complex and human Burkitt lymphoma were identi ed in dogs bearing DDX3X mutations (Supplementary Data 9). Dogs with MYC mutations were characterized by signatures associated with tumor microenvironment and apoptosis (Supplementary Data 10). Finally, the expression levels of the immune signatures characterizing hot cDLBCLs were positively associated with TMB (Wilcoxon rank-sum test, p < 0.05).

Integration of -omics and clinico-pathological features predicts survival in cDLBCL
We developed a multivariate supervised learning approach for de ning the association of survival with clinico-pathological variables, genetic features and gene expression data. Among all, the Cox model outperformed the random forest models and was considered for further steps ( Supplementary Fig. 5a, b). The features were combined to generate a prediction score and cDLBCLs were divided into two subgroups based on their predicted risk (long and short survival). The best performing scores for LSS (AUROC= 0.95) and TTP (AUROC=0.87) were obtained with the following variables: age, bone marrow in ltration (%), treatment (chemotherapy vs chemo-immunotherapy), TP53 genetic status (mut vs wt), and STAP2 and G3BP2 gene expression as logCPM ( Supplementary Fig. 5c, d). By excluding STAP2 and G3BP2 expression data AUROC for LSS and TTP dropped to 0.90 and 0.79, respectively (Supplementary Fig. 5a, b). Furthermore, when TP53 genetic data was removed from the analysis AUROC for LSS and TTP dropped further to 0.83 and 0.74, respectively (Supplementary Fig. 5a, b). To prospectively use these data in clinical practice we have developed an interactive webtool (http://compbiomed.hpc4ai.unito.it/canine-dlbcl) to predict survival and tumor relapse using clinicopathological data, transcriptomic and genomic features.

CNAs identi ed by WGS are associated with clinical outcome
We further performed WGS in 10 selected dogs to investigate a possible correlation between genetic pro ling and clinical outcome. All dogs were treated with chemo-immunotherapy and were divided into two groups, namely "bad responders'' (n=5) and "good responders'' (n=5).
Although differential exposure analysis revealed no signi cant difference between the two groups, the signature in "good responders" that mainly contributed to the mutation counts matched COSMIC signature 1A, whereas this was true in only one "bad responder".
In both groups two other signatures equally contributed to mutation spectra and resembled COSMIC signature 9 that is attributed to polymerase η and is implicated in somatic hypermutation in lymphoid cells, and signature 17, that has been previously described in human B-cell lymphomas. In addition, PCSK5 and OR1I1 were exclusively mutated in "bad responders", whereas TASOR2, ENSCAFG00000038553 and ENSCAFG00000041925 were only mutated in "good responders" (Supplementary Data 11). CNA analysis identi ed two losses in CFA8 exclusively in "good responders" and one gain in CFA 13 in "bad responders".
Con rmation of the negative prognostic value of mutated TP53 in an independent cohort of cDLBCL We con rmed TP53 mutations by Sanger sequencing and further examined the exons 4-8 in a second group of 56 dogs affected by DLBCL, whose clinico-pathological features are reported in Supplementary Data 12. Fifteen dogs were mutated with 14 different mutations. Indeed, two dogs shared the same mutation. Eleven mutations were novel, while three were already reported in dbSNP albeit with unknown frequency. In three dogs, variants were classi ed as germline since retrieved in matched normal tissue. Among the somatic mutations, we classi ed nine missense, one frameshift deletion, one frameshift insertion, and one splice-acceptor variant. All missense mutations were predicted as deleterious by SIFT (Supplementary Data 13). Mutations in TP53 were associated with age and signi cantly enriched in dogs diagnosed with stage IV disease (p=0.001). The prognostic relevance of TP53 mutations was con rmed (Supplementary Data 14). Indeed, TP53mut dogs had a signi cantly shorter TTP and LSS compared to TP53wt dogs ( Supplementary Fig. 8a, b).

Discussion
The application of genetic and transcriptomic analyses has led to an increased understanding of the biology of human DLBCL and allowed the design of target speci c therapeutic approaches. Here, we integrated clinical features, somatic mutations, CNAs, and transcriptome in cDLBCL providing new data on the genetic hallmarks of this tumor and identifying multiple mutated genes associated with outcome thereby providing potential therapeutic targets.
TRAF3, mutated in 53% of the dogs, was the most frequently affected gene in our series. The gene product is a part of a complex including TRAF3 itself, TRAF2, BIRC2, BIRC3, and MAP3K14. The disruption of this complex leads to the activation of the non-canonical NF-κB pathway. TRAF3 inactivation has been described in both canine and human lymphoid neoplasms 8,10−15 , including in 15% of human DLBCL cases in which its contributes to activation of the NF-κB signaling cascade 16 . Indeed, in our series, dogs with mutated TRAF3 presented an active NF-κB transcriptome program and additional genes encoding members of the complex were recurrently mutated, especially MAP3K14. The latter codes for a kinase (also known as NIK, NF-κB inducing kinase), which phosphorylates NFKB2 (p100), causing its proteasomal processing and the formation of p52-containing NF-κB dimers which translocate into the nucleus to transactivate target genes. MAP3K14 mutations were largely mutually exclusive with TRAF3 mutations and overall, 80% of cDLBCL presented genetic lesions compatible with NF-κB activation, emphasizing the importance of this pathway in cDLBCL pathogenesis. These data provide potential therapeutic targets [17][18][19] but also highlight lesions associated with resistance to BTK inhibitors 14,20 with implications for the management of dogs with cDLBCL and for the use of these animals as models for the human disease.
A total of 31% of dogs harbored mutations in SETD2, which was the second most frequently altered gene. SETD2 is a methyltransferase with various substrates and multiple lines of evidence support its tumor suppressor role 21 . Its most well-characterized function is the trimethylation of lysine 36 on histone H3 (H3K36me3). Loss of this histone mark can lead to deregulated methylation of intragenic regions, RNA splicing, homologous recombination and mismatch repair. However, SETD2 also has non-histone substrates, including microtubules and transcription factors (for example, TP53), whose aberrant modi cation can also contribute to cancer development and progression. Mutations in SETD2 have been reported in 20-40% of canine osteosarcomas 22 as well as in human tumors, including lymphomas [23][24][25][26][27] . Differently from our observations in cDLBCL, SETD2 mutations are present in less than 10% of human DLBCL 26,27 , but more common in T-cell lymphomas [23][24][25] . SETD2 was not the only gene encoding proteins involved in chromatin remodeling and transcription regulation. Over 70% of cDLBCL contained at least one mutated gene of this class, including histone 3 members (H3C8 or H3C12, 10%), KDM6A, SUZ12 (5%), KDM2A, KDM3B, and EZH2 (3%) (Fig. 10).
The mutations in H3C8 and H3C12 occurred in the same hotspot, determining amino acid 27 (or 28 according to COSMIC annotation) conversion from lysine (K) into methionine (M). This mutation has also been observed in human pediatric gliomas, where it de nes a speci c entity termed diffuse midline glioma, which is an in ltrative midline high-grade glioma with predominantly astrocytic differentiation 28 . The mutation inhibits the activity of the polycomb repressive complex 2 (PRC2), composed of the K27 histone methyltransferase EZH2 (enhancer of zeste homolog 2) and the core accessory proteins EED, SUZ12, and RbAp48 29 . Since methionine cannot be methylated by EZH2, gliomas bearing the K27M mutation present a global reduction of H3K27me levels, a modi cation associated with gene silencing, and DNA hypomethylation at many loci. However, perhaps due to a redistribution of the PRC2 complex, H3K27M mutated gliomas still retain a substantial number of genes with the H3K27me3 mark and are dependent on the remaining PRC2 enzymatic activity [29][30][31] .
In cDLBCL we also observed recurrent inactivating heterozygous mutations in EZH2 and SUZ12. The removal of di-and trimethyl groups from H3K27 is done by two histone demethylases, one of which, UTX, is encoded by KDM6A. KDM6A was mutated in 5% cDLBCL. UTX forms a complex with H3K4 methyltransferases MLL2 (KMT2D)/MLL3 (KMT2C) 29 . The inactivation of the two methyltransferases is observed in 20-30% of human DLBCL 32,33 and it determines a diminished global H3K4 methylation with deregulation of CD40, Toll-like, and B-cell receptor signaling pathways 34,35 .
In our series we didn't nd any mutations in the acetyltransferases CREBBP and EP300, which activate transcription via acetylation of histone H3 lysine 27 (H3K27Ac). Besides impairing PRC2 activity, H3K27M might also affect acetyltransferase activity, leading to aberrant gene expression and enhancer dysfunction 36 , and this could at least partially mimic the effects of CREBBP and EP300 mutations.
KDM2A (JHDM1A/FBXL11) and KDM3B (JMJD1B/ JMJD1B) are two histone lysine demethylases that catalyze the demethylation of transcriptionally repressive mono-and di-methylated histone H3 lysine 36 (H3K36me1/me2) and 9 (H3K9me1/me2), respectively 37 . Not much is known about the roles of these two enzymes in cancer, which appears to be dependent on tumor type 37 . However, rare mutations are reported in human B-cell lymphomas 38 .
We also identi ed several novel targets of mutation in canine cancers, including in particular TBL1XR1. This gene is frequently mutated in human ABC DLBCL, speci cally in the genetically de ned MCD/Cluster 5 subtypes, which are characterized by frequent extranodal localization, immune-escape lesions, and an unfavorable clinical outcome 39,40 . TBL1XR1 mutations were associated with a poor prognosis but the low frequency of dogs with this alteration prevented statistical signi cance being reached. TBL1XR1 is a core component of the SMRT/NCOR1 transcriptional repressor complexes, thus also contributing to transcription regulation.
Similarly to the NF-κB signaling cascade, the high frequency of alterations in genes involved in chromatin regulation and transcription regulation is of dual relevance to veterinary and comparative medicine.
Evidence from preclinical studies indicate the potential use of therapeutic agents including EZH2 inhibitors for KDM6A 41 and H3K27M mutants 31 , demethylating agents for KDM6A mutants and HDAC3 inhibitors for TBL1XR1 mutants 42 . The latter 43 might also show activity in H3K27M mutants, due to the possibly reduced acetyltransferase activity of CREBBP and EP300 on H3K27M 36 . LSD1 inhibitors 44,45 and especially KDM5 inhibitors 46 could work in KDM6A mutants, which might have concomitant deregulated MLL2/MLL3 activity 29 . However, it is important to keep in mind that although cDLBCL and its human counterpart appear phenotypically similar, subtle genetic differences exist between the two.
In our series, we identi ed recurrently mutated genes found in WES studies of other canine cancers including osteosarcoma, melanoma and lymphoma 22,47 . Compared to the previous two studies in canine lymphoma, the frequency of mutations for the top four genes was overall higher in the current work. This difference could be attributable to several reasons. First, we included fully characterized cDLBCL, whereas in previous studies, a general diagnosis of B-cell lymphoma was reported and mostly obtained by ne needle aspiration. Second, in our series, normal DNA was available for every dog and used as a match for variant calling. Finally, from a technical point of view, a wider whole exome enrichment kit was used here.
Recent advances in chemo-immunotherapy have offered new options for the treatment of canine cancers. The addition of APAVAC to CHOP-based chemotherapy has dramatically prolonged both remission time and overall survival in dogs with DLBCL. This data was con rmed in our cohort, but here we also identi ed genetic markers that were able to predict clinical behavior. While immune responses are generally considered to be durable and adaptable, thus conferring a signi cant advantage over chemotherapy as single treatment modality, not all patients will respond to immunotherapy 3 and pulling out the non-responders is challenging. In light of this, predicting a patient's response to immunotherapy to achieve a better outcome or save treatment costs remains an active area of clinical research. Here, we identi ed two groups of animals differing for their clinical outcome. Dogs older than 10 years, having bone marrow involvement, and TP53 mutations had a poor outcome and a small bene t from chemoimmunotherapy. Conversely, the absence of such features identi ed a cDLBCL subset with good prognosis and an important gain in survival if treated with chemo-immunotherapy.
The negative prognostic impact of TP53 mutations has been reported in human DLBCL 39,40,48 , but it is still unclear in dogs. In our series, TP53 mutations were frequently found by WES, and these ndings were validated in a second group of dogs, maintaining a similar frequency and prognostic signi cance. Loss of TP53 causes disruption of checkpoint responses to DNA damage and contributes to genomic and chromosomal instability. Here, most of the mutations affected the p53 DNA-binding domain and we can thus hypothesize an inactivating effect, even if this would require an experimental validation. In addition, the co-occurrence of TP53 and FBXW7 mutations resulted in a worse outcome than TP53 mutation alone.
Other than TP53, TRAF3 mutants were associated with a better outcome than TRAF3 wild-type animals; conversely POT1 mutations, previously reported in cDLBCL, were associated with poor outcome in our dogs. POT1 mutations contribute to cancer development in multiple ways, including human chronic lymphocytic leukemia, where the predominant effects of POT1 loss are increased telomere length in cells with telomerase activity and genomic instability 49 contributing to tumorigenesis.
In conclusion, our results suggest that clinical trials testing new targeted agents in cDLBCL should be evaluated in the context of clinical features and genetic aberrations, including mutations affecting TP53, non-canonical NF-κB and chromatin remodeling.

Animal recruitment and samples acquisition
Dogs with newly diagnosed, previously untreated, multicentric cDLBCL of any World Health Organization (WHO) clinical stage were included in the study. To be eligible for enrollment, dogs had to undergo a complete staging work-up, consisting of history and physical examination, complete blood cell count with differential, serum biochemistry pro le, thoracic radiographs and abdominal ultrasound, cytological evaluation of liver and spleen regardless of the ultrasonographic appearance, and immunophenotype determined by ow cytometry (FC) on a lymph node (LN) aspirate, peripheral blood and bone marrow aspirate. Before the initiation of therapy, all dogs underwent lymphadenectomy to con rm DLBCL histotype by routine histology and immunohistochemistry (CD3, CD20, CD79 and PAX5) and to provide material for vaccine generation 50 . A portion of the neoplastic lymph node was always preserved in RNAlater and stored at -80°C. In addition to tumor samples, skin punch biopsies were obtained from all dogs included in the study to provide matched paired normal tissue. Dogs' owners were required to give written informed consent. Approval for this study was granted by the Ministero dell'Istruzione, dell' Università e della Ricerca Ethical Board (Number RBSI14EDX9).
Depending on owner's preference, dogs were treated either with chemotherapy or with chemoimmunotherapy. Unvaccinated dogs received a CHOP-based protocol, including L-asparaginase, vincristine, cyclophosphamide, doxorubicin, lomustine and prednisone.
Regarding dogs receiving chemo-immunotherapy, the detailed method of the APAVAC vaccine preparation and the protocol used has been described elsewhere 50 . Brie y, dogs received L-asparaginase, vincristine, cyclophosphamide, doxorubicin, lomustine, prednisone, and a total of 8 intradermal injections of 0.5 ml vaccine.
Treatment response was classi ed as complete remission (CR), partial remission (PR), stable disease (SD) or progressive disease (PD). Response was evaluated at each therapeutic session and was required to last for at least 28 days. Follow-up evaluation consisted of monthly physical examination, peripheral LNs size measurement and cytological evaluation during the rst year, and every other month thereafter.
Relapse was de ned as the clinical reappearance and cytological evidence of lymphoma with or without FC con rmation in any anatomical site in dogs having experienced CR. Once relapse was con rmed, a complete restaging work-up was undertaken, and a second round of chemotherapy was offered. TTP was calculated as the interval between initiation of treatment and PD or relapse, whereas LSS was measured as the interval between initiation of treatment and lymphoma-related death. Dogs lost to follow-up or dead for lymphoma-unrelated causes before PD, as well as those still in CR at the end of the study, were censored for TTP analysis. Dogs alive at the end of the study, lost to follow-up or dead due to causes other than lymphoma were censored for LSS analysis 51 52 . Libraries were quanti ed using the Qubit DNA Assay Kit in a Qubit 2.0 Fluorometer (Thermo Fischer, Foster City, CA, USA) and quality was assessed using the Bioanalyzer 2100 instrument. For WGS, ten libraries (fragments ranging from 300 to 400 bp) were then pooled and sequenced on an Illumina NovaSeq 6000 platform in a paired-end (150 PE) mode. The chosen target sequencing coverage was 30x and 250x for WGS and WES, respectively.
RNA-Seq, WES and WGS reads quality control, alignment and preprocessing Data quality control was rst performed on raw Illumina reads for RNA-Seq, WES and WGS using FastQC v.0.11.9 (https://www.bioinformatics.babraham.ac.uk/projects/download.html) software. RNA-Seq reads trimming, mapping, gene quanti cation and differential expression analysis steps were performed using the pipeline previously described in Aresu et al. 7 . We calculated the two prognostic transcriptomic subgroups as previously described 53 . Brie y, we retrieved immune gene sets for immune cells, tumor in ltrating lymphocytes (TILs), proin ammatory molecules, cytokines and cytokine receptors, regulatory T-cells and immune checkpoints, and performed single sample GSEA to derive the enrichment score of each immune-related term. By applying unsupervised consensus clustering analysis, we separated cDLBCL into two subgroups (high immunity = "hot" and low immunity = "cold"). (http://dogsd.big.ac.cn/) 62 . Strelka was run together with Manta for best somatic indels performance and for the processing of WES samples, the -exome and -callRegions parameters were added to restrict the calls to the targeted regions only. Resulting raw somatic calls of SNVs and indels from the two callers were then ltered using caller-speci c lters and hard lters as de ned in the respective developer's guidelines. Brie y, Mutect2 lters were performed with GATK4 FilterMutectCalls tool and comprised, for instance, the removal of calls under germline risk, calls with low depth, artifacts related to sequencing platforms or repeat regions; hard lters were based on the ltering of variants present in dbSNP and PONs to avoid false positive calls (https://software.broadinstitute.org/gatk/documentation/article? id=11136). For Strelka, the default lters were used (https://github.com/Illumina/strelka/blob/v2.9.x/docs/userGuide/README.md#somatic). A consensus calling was then applied to ltered somatic calls from the two callers (in VCF version 4.1/4.2 format) by overlapping the remaining variants using the bcftools software package's -isec option 63 . Finally, resulting consensus VCF les were annotated with ANNOVAR 64 . Mutation Annotation Format (MAF) using vcf2maf utility (https://github.com/mskcc/vcf2maf) was also created. Annotated VCFs and MAFs were used for all the downstream analyses.

Tumor Mutational Burden and Mutational Signatures analysis
Tumor mutation burden (TMB) was calculated by dividing the total number of single nucleotide variants by the amount of the genome covered by exome sequencing and the result was expressed as the number of SNV per megabase (Mb). Mutational signatures were predicted both in WES and WGS. Brie y, SNVs mutations were analyzed in the context of the anking nucleotides (96 possible trinucleotide combinations) using a full Bayesian treatment to the non-negative matrix factorization (NMF) model implemented in the SigneR 65 R package. The number of mutational processes per Mb and their signatures have been evaluated for each sample, and the identi ed ones were compared with validated signatures in human cancer 66,67 . Furthermore, WGS "bad" and "good responders" were compared in a differential exposure analysis to highlight signatures that were differentially active among the two groups (p < 0.05).

Copy Number Analysis
CNA analysis was performed on WES and WGS data using the NEXUS Copy Number v.10.0 software (Biodiscovery Inc., El Segundo, CA). CNAs and loss of heterozygosity (LOH) were identi ed starting from pre-processed BAM les using a matched approach ("BAM ngCGH-matched") and against the CanFam3.1 reference genome. Then, a FASST2 segmentation algorithm with a signi cance threshold of 1.0e-12 was applied. The dbSNP was also given to the software to assess germline events. In the case of WGS, the bin size was reduced from 1000 (default for WES) to 500. Aberrations were de ned as a minimum of 3 consecutive segments and a maximum probe spacing of 1000 kbp between adjacent probes before breaking a segment. Segments were classi ed with log2 tumor/reference ratio value of >0.5 as high gains, 0.5 to 0.2 as gains, -0.25 to -1.5 as losses, <-1.5 as big losses, and the heterozygous imbalance threshold was set to 0.4. Furthermore, independently from the automatically generated CNA calls, each sample was visually inspected for using Nexus Copy Number software v.10.0 (Biodiscovery Inc., El Segundo, CA). Recurrent CNAs were determined within NEXUS using the GISTIC algorithm with a G-score cut-off of G > 1.0 and a signi cance of Q < 0.05. CNA frequency comparisons amongst groups (i.e. clinical features, presence/absence of somatic mutations) were performed with a two-tailed Fisher's exact test (p < 0.05).

Intra-Tumor Heterogeneity analysis
To verify the presence of multiple subclonal populations of cells tumor samples were inferred using sciClone R package 68 applying default parameters according to the developer's manual. Copy number segments were extracted from WES samples using circular binary segmentation (CBS) method implemented in CNVkit 69 , whereas mutations were extracted from Strelka/Mutect2 intersection VCF les.
LOH regions inferred from NEXUS Copy Number were excluded.

Functional annotation analysis
The analysis comparing mutational processes and transcriptome to evaluate enriched gene sets in WT/mutated samples was performed using the GSEA in a preranked mode, with a threshold of signi cance of FDR <5% 70

Prediction models
Two predictive models were tested and implemented in the scikit-survival module (version 0.15.0) in Python (version 3.8.9): Cox's proportional hazard's model with elastic net penalty and random survival forest. Separate models were built for LSS and TTP. Performance of the models was evaluated by eightfold cross-validation with two holdout sets, repeated twenty times on random permutations of the samples. One holdout set was needed to select the optimal model by hyperparameter grid search and the other was used to evaluate the nal performance of the optimal model. Performance was evaluated by the area under the receiver operating characteristic (AUROC) for classifying dogs with survival time lower than the median survival in the dataset. Given the limited number of samples and the much larger number of available features and the observed poor performances of models tted using too many features, we decided to eschew automatic feature selection methods. Thus, various datasets with different features were tested to nd those that were most predictive and combine them in a single predictive model. Categorical clinical features were one-hot-encoded producing 41 numeric and binary features.
We tested a dataset of 77 samples with 2832 genetic mutations (binary) and clinical features and a dataset of 43 samples with clinical features and the top 100 most signi cantly differentially expressed between tumors and controls. Features whose regression coe cient were consistently different from zero among the cross-validation ttings of the best performing models were selected and put together in the nal dataset.

Statistical analysis
To explore the associations between clinico-pathological variables and mutated genes and identify different classes of clinico-pathological variables in mutated or wt individuals, we built classi cation trees using the recursive partitioning algorithm implemented in the party R package.
Survival analysis was conducted using survival and survminer R packages. The following clinicopathological variables were tested for their in uence on TTP and LSS by means of univariate and multivariate Cox proportional hazard model: treatment (chemotherapy vs chemo-immunotherapy ), breed (pure vs mixed), sex (female vs male), age (<10 years vs ≥10 years), weight (<10kgs vs ≥10kgs), stage