Single-cell RNA sequencing reveals the cellular and molecular heterogeneity of treatment-naïve primary osteosarcoma in dogs

Abstract Osteosarcoma (OS) is a heterogeneous, aggressive malignancy of the bone that disproportionally affects children and adolescents. Therapeutic interventions for OS are limited, which is in part due to the complex tumor microenvironment (TME) that has proven to be refractory to immunotherapies. Thus, there is a need to better define the complexity of the OS TME. To address this need, we used single-cell RNA sequencing (scRNA-seq) to describe the cellular and molecular composition of the TME in 6 treatment-naïve dogs with spontaneously occurring primary OS. Through analysis of 35,310 cells, we identified 30 distinct immune cell types, 9 unique tumor populations, 1 cluster of fibroblasts, and 1 cluster of endothelial cells. Independent reclustering of major cell types revealed the presence of follicular helper T cells, mature regulatory dendritic cells (mregDCs), and 8 transcriptomically distinct macrophage/monocyte populations. Cell-cell interaction inference analysis predicted that mregDCs and tumor-associated macrophages (TAMs) play key roles in modulating T cell mediate immunity. Furthermore, we used publicly available human OS scRNA-seq data to complete a cross-species cell type gene signature homology analysis. The analysis revealed a high degree of cell type gene signature homology between species, suggesting the cellular composition of OS is largely conserved between humans and dogs. Our findings provide key new insights into the biology of canine OS and highlight the conserved features of OS across species. Generally, the data presented here acts as a cellular and molecular roadmap of canine OS which can be applied to advance the translational immuno-oncology research field.


Introduction
Osteosarcoma (OS) is an aggressive malignancy of the bone that disproportionally impacts children and adolescences. Despite a profound impact on the lives of affected individuals, effective therapeutics are lacking, with minimal advancements since the introduction of combined surgical excision and adjuvant chemotherapy in 1986 1 . Slow advancements in the development of OS therapeutics are, in part, due to the relatively rare incidence which limits patient accrual into clinical trials. In recent years, there has been increased interest in using large animal models to evaluate and validate the potential of immunotherapeutics for various cancers 2,3 . Spontaneously occurring canine OS is regarded as an ideal model of human OS due to higher disease prevalence in dogs, similar genetics and pathology, and the immune competent status of dogs 4 . Although dogs have been identi ed as a valuable pre-clinical model, immunological reagent limitations have restricted researcher's ability to effectively characterize the canine OS tumor microenvironment (TME).
Osteosarcoma has a complex TME that consists of malignant osteoblasts, osteoclasts, broblasts, macrophages, and lymphocytes as well as numerous other stromal and immune components. Together, the OS TME creates an immune suppressive milieu that hinders antitumor immune responses 5 .
Researchers have turned to the TME with the objective of understanding and targeting the cellular constituents that promote immune suppression 6,7 . Unlike many other cancer types, there have been reports in both humans and dogs that suggest increased macrophage abundance in OS reduces metastatic rate and enhances survival 8- 10 . This unexpected nding and the ill-de ned mechanisms of immune suppression in the OS TME highlights the need for a deeper understanding of OS pathobiology.
In recent years, single-cell RNA sequencing (scRNA-seq) has emerged as a valuable tool to investigate the transcriptomes of individual cells within heterogenous tissues. The approach overcomes species-speci c regent limitations by relying on a universal transcript capture method that is only limited by the completeness of genome annotations 11 . Importantly, the human scRNA-seq landscapes of primary, recurrent, and metastatic OS have recently been described and act as a point of reference for cell type homology analysis between canine and human OS 12,13 . The aim of the current study was to use scRNAseq to complete a molecular dissection of the canine OS TME and evaluate cell type transcriptomic homologies between humans and dogs.
In the study, we generated a single-cell RNA sequencing atlas of six treatment-naïve dogs with primary osteosarcoma. Our analysis reveals the presence of 41 transcriptomically distinct cell types in canine OS and provides evidence of conserved cell type gene signatures and composition between human and canine OS. Overall, the data generated here can be used to inform the identi cation of conserved OS TME features and facilitate further study of the canine osteosarcoma tumor microenvironment.

Results
Establishment of a treatment-naïve canine osteosarcoma reference database To establish a treatment-naïve canine osteosarcoma reference, we completed single-cell RNA sequencing (scRNA-seq) on 6 dogs and collected data on a total of 35,310 cells. The average number of cells collected per tumor was 5,885 and on average each cell was sequenced to a depth of 72,649 reads per cell. All tumors were con rmed to be osteosarcoma by histology and histological subtyping was completed on each tumor. In total, the curated dataset consisted of 1 broblastic, 1 chondroblast, and 4 osteoblastic tumors, with one dog exhibiting radiographical evidence of lung metastasis (Table 1). Initial low-resolution cell type annotation revealed the presence of 7 major cell types consisting of T cells, B cells, tumor in ltrating monocytes (TIMs)/tumor associated macrophage (TAMs), dendritic cells (DCs), osteoclasts (OCs), tumor cells, cycling tumor cells, and an additional 3 minor cell populations consisting of neutrophils, mast cells, and endothelial cells (Fig. 1a). Evaluation of the dataset for evidence of batch effects indicated uniform distribution of cell types between biological replicates. The one exception was that naïve dog 6 had more neutrophils and T cells relative to the other study dogs (Fig. 1b). This skew might be a result of sampling bias in which necrotic tumor, blood, or bone marrow contamination was introduced during sampling. Subsequent analysis of cell type proportions revealed 42.3% of the dataset consisted of tumor cells or broblasts, 2.1% of the dataset was endothelial cells, and the remaining 55.6% was composed of immune cells (Fig. 1c).
Cell types were annotated based on expression of canonical markers, reference mapping using a human OS dataset, and gene set enrichment analysis (Fig. 1d). Cell type gene lists used by Liu et. al to de ne cell populations in human OS were applied using module scoring to provide further support for cell classi cations (Supplemental Fig. 1a) 12 . These approaches consistently enabled the identi cation of T cells, B cells, osteoclasts, and endothelial cells. However, our high-level unsupervised clustering failed to distinguish between stromal broblasts and malignant osteoblasts. This unexpected observation may be in part due to the presence of a broblastic osteosarcoma tumor in our dataset and the broad expression of broblast markers (FAP, FBLN1) across all tumor cell clusters (Supplemental Fig. 1b).
Due to the inability to identify a distinct broblast population using feature expression, we applied CopyKAT to complete copy number variation (CNV) analysis to infer which cells exhibited aneuploidy based on their transcriptional properties (Fig. 1e) 14 . The analysis revealed that the majority of cells in the tumor/ broblast cluster exhibited evidence of CNV aberrations with only a small subset of cells predicted to be diploid ( Fig. 1e; purple arrow). The diploid cells were determined to represent a small cluster of broblasts which were investigated further through independent reclustering.
Dissection of the tumor and stromal populations reveals a distinct broblast cluster Independent reclustering on cycling tumor cells and tumor/ broblasts identi ed 10 distinct cell clusters which we de ned as 4 cycling malignant osteoblasts clusters, 5 non-cycling malignant osteoblast clusters, and 1 broblast cluster (Fig. 2a). The de ning features for each cluster were identi ed using a Wilcoxon Rank Sum test and the top 3-5 unique features were visualized using a heatmap and feature plots ( Fig. 2b/c). Overall, the malignant osteoblasts exhibited a unique gene expression pro le with collagen genes and alkaline phosphatase (ALPL) contributing to the gene signatures. We observed a small cluster of tumor cells (c9) that exhibited a gene expression pattern (OAS1, ISG15, OAS2) consistent with an interferon (IFN) response gene signature (Fig. 2b). This observation was further supported through completion of GSEA using Hallmarks gene set terms (Fig. 2d). Similar IFN signature enriched cells have been reported among immune cells, but the observation of such a cluster in a tumor population has not been previously reported in human OS studies 12,13,15,16 . Interpretation of GSEA further revealed that broblasts (c6) exhibited the most pronounced "epithelial-mesenchymal transition" (EMT) and "angiogenesis" signatures, which suggests the broblasts might play a role in promoting tumor growth. Additionally, GSEA supported the annotation of hypoxic osteoblasts (c4), as the cluster exhibited the strongest hypoxic transcriptomic signature.
To con rm the identi cation of broblasts, we used module scoring with a human broblast gene list 17 . This analysis con rmed Cluster 6 exhibited the strongest broblast gene signature (Supplemental Fig. 2a). We then completed differential gene expression (DGE) analysis contrasting broblasts (c6) and non-hypoxic osteoblasts (c0, c1, and c2) to better de ne the canine broblast gene signature ( Fig. 2e; Supplemental data 1). While key broblast markers such as FAP and ACTA2 were identi ed, the top features consisted of SFRP2 and PRSS23 which were recently reported to be associated with a broblast population involved in wound healing 18 . To conclude our analysis of tumor cells, we sought to further investigate the transcriptomic signature of hypoxic osteoblasts (c4) by contrasting with non-hypoxic osteoblasts (c0, c1, and c2). Few differentially expressed genes were identi ed, suggesting cell types are similar, but subsequent pathway analysis identi ed enrichment of "response to oxygen levels" to be a top enriched pathway, suggesting that the tumor cells were indeed hypoxic (Fig. 2f, Supplemental data 1, Supplemental Fig. 2b). In summary, we were able to resolve a population of broblasts through completion of independent reclustering, as well as de ne the transcriptional heterogeneity within malignant osteoblasts.
Independent reclustering reveals a population of CXCL13 + follicular helper CD4 T cells To ensure we captured all biologically relevant T cell populations, we completed independent reclustering, which led to the identi cation of 10 transcriptomically distinct clusters: 3 CD8 T cell, 4 CD4 T cell, 1 NK cell, and 2 mixed CD4/CD8 T cell clusters ( Fig. 3a/b). Next, we interrogated T cell subtypes using an approach that has been applied in human breast cancer and OS scRNA-seq datasets to describe T cell populations 13,19 . We modi ed the gene lists used in previous applications to include signatures for cycling T cells, NK cells, and IFN-signature T cells that we recently established in circulating canine leukocytes 20 . Overall, this approach proved to be consistent with annotations assigned using canonical markers (Fig. 3c). Although the gene signatures were de nitive for naïve CD4 T cells and cytotoxic CD8 T cells, other gene signature scores provided weaker support for their corresponding cell types. For instance, regulatory T cells (T regs ) and follicular helper CD4 T cells (CD4 fh ) both exhibited moderate enrichment for exhausted and costimulatory terms, with minimal distinction between the two T cell types. The analysis also revealed the presence of a T cell cluster with an IFN gene signature, a population that has been reported to be hypersensitive to stimulation 16 .
After identifying each T cell subset, we completed pseudobulk conversion and DGE analysis to further establish the transcriptomic signatures of T regs and CD4 fh . Comparisons between T regs (c3) and activated CD4 T cells (c2) revealed overexpression of IL21R, TNFRSF4, and TNFRSF18, with CTLA4 being the most de nitive marker of T regs (Fig. 3d, Supplemental data 1). When repeating this analysis on CD4 fh (c5) cells, we identi ed CXCL13, IL4I1, and TMEM176A to be de ning features (Fig. 3e, Supplemental data 1). Although intratumoral CD4 fh cells exhibited a distinct exhaustion pro le (PDCD1, TOX, TOX2, IL4I1), they also displayed a gene signature consistent with follicular helper T cells (CXCL13, IL21, CD70) 21,22 . A similar population of CXCL13 + CD4 fh T cells has been identi ed in multiple human tumors and the cell type has been implicated in tertiary lymphoid follicle formation and modi cation of intratumoral adaptive immune responses [23][24][25] . Our analysis con rms expression of CTLA4/FOXP3 on T regs and CXCL13/IL21 on CD4 fh is conserved across species, while also providing complete gene signatures for the canine T cell subtypes 17 . Mature regulatory dendritic cells are present in canine OS and are predicted to modulate T cell mediated immunity Five dendritic cell (DC) subtypes were identi ed when completing independent reclustering on FLT3 + cells. The subtypes identi ed included conventional DC2s (cDC2; c0), mature regulatory DCs (c1; mregDC), cDC1s (c2), plasmacytoid DCs (c3; pDC), and precursor DCs (c4; preDC) ( Fig. 4a). Key features used to assign cell type identities included DNASE1L3 (cDC1), CCR7/IL4I1 (mregDCs), CD1C (cDC2), IL3RA (preDC), and IGKC (pDC) (Fig. 4b) 27,28 . The population of canine preDCs closely resembled a recently rede ned human preDC cell type that exhibits a tendency to cluster with pDCs when investigated using scRNA-seq 27 . Of note, we previously identi ed cDC2, cDC1, preDC, and pDC cell types in canine peripheral blood, however mregDCs (c1) were not observed, suggesting a potential tissue speci city 20 . The identi cation of mregDCs, also reported as migratory (mig) DCs, is of note as this cell type is predicted to modulate T cell responses 29,30 . Thus, we provide evidence that a key immune regulatory cell type is present in canine OS.
We next used hierarchical clustering and Toll-like receptor expression to investigate differences between preDCs and pDCs. Hierarchical clustering indicated preDCs are closely related to myeloid cDC2s and cDC1s, suggesting a myeloid lineage, while the pDCs were located on a unique clade, suggesting a lymphoid origin (Fig. 4c). In humans, pDCs are reported to exhibit high expression of TLR9 and TLR7, which we identi ed to be highly expressed on canine pDCs (Fig. 4d) 31 . To ensure that none of the DC populations were of B cell origin, we evaluated MS4A1 (CD20) expression and found it to be minimally expressed (Supplemental Fig. 4a). We then used SCENIC to predict active regulons in each DC subtype (Supplemental Fig. 4b). This analysis revealed RUNX2, a master regulator of pDC development, to be enriched in both pDCs and preDCs 32 . Overall, these ndings suggest canine preDCs are closely related to the recently de ned plasmacytoid-like human preDCs 27 .
To con rm mregDCs exhibited a mature, immune regulatory transcriptomic signature, we used module scoring with gene lists previously applied to investigate human DC subtypes 29,33 . This analysis revealed that mregDCs had a marked enrichment for migration, regulatory, and maturation associated gene signatures (Fig. 4e). Subsequent, DGE analysis of canine mregDCs (c1) relative to cDC2s (c0) revealed a distinct mregDC signature of CCR7, IL4I1, CCL19, and FSCN1 with substantial overlap to the human mregDC transcriptional program (Fig. 4f, Supplemental data 1) 29 . With the precedent that mregDCs interact with intratumoral T cells to shape adaptive immune responses in humans, we wanted to determine whether a similar interaction might occur between mregDC and T cells in dogs 23,24 . We used CellChat to evaluate interactions between mregDCs and T/NK cells 34 . This analysis revealed enriched PD-1/PD-L1 and CTLA4/CD80 interactions between mregDCs and CD4 T regs , T fh cells, and naïve T cells ( Fig. 4g). In summary, we present the transcriptomic signature of canine mregDCs and provide evidence of intratumoral interactions between canine mregDCs and T cells.
Macrophage transcriptomic states supports a spectrum of cell types Due to the transcriptional overlap between tumor associated macrophages (TAMs) and osteoclasts (OCs), we analyzed these two cell types in the same UMAP space. In doing so, our analysis highlighted the relatedness of OCs and TAMs which would have been overlooked if analyzed independently. Through independent reclustering we identi ed 8 transcriptomically distinct macrophage/monocyte populations which were annotated using modi ed nomenclature derived from Ma et al (Fig. 5a-c) 35  In addition to the recently proposed TAM nomenclature, we used module scoring with pro-and antiin ammatory gene lists to investigate the macrophage populations in a more traditional dichotomy (Supplemental table 2) 37 . We identi ed the C1QC hi LA-TAM (c3) cluster to have the strongest antiin ammatory transcriptomic signature while CD4 + monocytes (c11) exhibited the most prominent proin ammatory transcriptomic signature (Fig. 5e). To further investigate the signatures of Clusters 11 and 3 we completed pseudobulk-based DGE analysis ( Fig. 5f, Supplemental data 1). The genes upregulated in Cluster 11 exhibited overlap with the prede ned gene set that was used to identify the cluster as proin ammatory, while also revealing IL-1B, S100A12, LTF, and VCAN as de ning features. DGE analysis of the anti-in ammatory cluster (c3) exhibited less overlap with the gene list originally used to identify the cluster as anti-in ammatory (with MRC1 the only overlapping feature), but analysis further revealed APOE, IGF1, and complement receptors C1QA/B/C as enriched markers. The top features identi ed when contrasting Clusters 11 and 3 were then used to generate a heatmap to evaluate how the expression of these features varied across all macrophage clusters (Fig. 5g). Findings from the analysis suggested that there is a spectrum of macrophage phenotypes, which is consistent with human macrophage literature 38 .
As such, we next sought to better de ne the heterogeneity of the macrophage populations without relying on prede ned cell type gene signatures.
Gene set enrichment analysis was used to provide further insights into the inferred functional capacity of each macrophage subtype (Fig. 5h). Cell clusters 4, 7, and 11 clustered together based on pathway enrichment scores suggesting the three transcriptionally distinct clusters have similar underlying gene signatures. LA-TAMs (c2, c3) and intermediate TAMs (c1) exhibited the strongest scavenger receptor associated activation, suggesting a mature macrophage population with immune suppressive properties 39 . Several terms were identi ed suggesting that both SPP2 hi LA-TAMS and intermediate TAMs preferentially utilize oxidative phosphorylation and mitochondrial metabolic pathways. C1QC hi LA-TAMs had a distinct pro le suggestive of lipid and polysaccharide metabolism. Lastly, GSEA con rmed c10 to be consistent with IFN-TAMs based on strong enrichment of IFN signaling associated terms. In summary, we described the transcriptional pro les of macrophages in the canine OS TME which provides a foundation for further investigation of the functional relevance of each cell type.
Analysis of osteoclasts reveals four transcriptomically distinct populations Using the same UMAP space, we next shifted our focus to further characterize osteoclast heterogeneity. Consistent with human and murine reports using single-cell RNA sequencing to characterize OCs, we identi ed 4 transcriptiomically distinct OC populations 12,13,40 . The cycling OCs (c5/c8) in our canine OS dataset likely correspond to previously reported pre/progenitor OCs, while the mature OCs (c6) are consistent with previous reports (Fig. 6a/b). CD320 + OCs (transcobalamin receptor expressing OCs, c9) have not been described in macrophage or osteoclast clusters from human and mouse tissues and may represent a canine speci c cell type, or more likely a previously unresolved OC subtype ( Fig. 6a/b). Due to the similarity of OCs and macrophages we completed hierarchical clustering to con rm the unsupervised clustering results (Fig. 6c). The secondary analysis was consistent with unsupervised clustering and further suggested Clusters 5, 6, 8, and 9 are distinct from the macrophage clusters.
To con rm the mature OC classi cation and provide a canine speci c transcriptomic signature, we completed DGE analysis. When comparing mature OCs (c6) to macrophages (c0, c1, c2, c3) we identi ed canine mature OCs to be de ned by ATP6V1C1, CD84, HYAL1, and CAMTA2 expression, with subsequent GSEA analysis suggesting an association with bone resorption and remodeling (Fig. 6d, Supplemental data 1, Supplemental Fig. 5a).
We next completed DGE analysis contrasting CD320 + OCs (c9) with macrophages and mature OCs.
(Supplemental data 1, Supplemental Fig. 5b/c). By evaluating the intersection of the differentially expressed genes we determined CD320 + OCs are de ned by HMGA1, TNIP3, and CD320 expression (Fig. 6e). The analysis also provided further evidence that c9 is an OC cluster based on TNFRSF11A (RANK) enrichment when contrasted with macrophage, but not when contrasted to mature OCs 41 . Lastly, we used SCENIC's regulon speci city scoring to better de ne the transcription factors active in mature OCs and CD320 + OCs. We identi ed ZEB1 and NFATC1, known regulators of OC development, to be enriched in mature OCs, while TCF4, IRF5, and TP53 were enriched in CD320 + OCs ( Fig. 6f/g) 42,43 .
Together this suggests, CD320 + OCs are a distinct population from mature OCs and may represent macrophage-like OC precursors.
Transcript abundance of widely used immunohistochemistry macrophage markers exhibit distinct speci city to myeloid cells In contrast to other tumor types, there have been multiple reports in humans and dogs suggesting that increased TAM in ltrates in OS are correlated with reduced metastasis rates and increased patient survival 8, 9 . Despite these reports, other groups completing similar analysis have concluded that increased macrophage in ltrates have a negative impact on OS clinical outcomes 44 . Given the con icting nature of previous reports we sought to employ our dataset to investigate which cell types express the transcript of the prototypical macrophage markers used for IHC analysis in these previous studies. To complete this analysis, we pro led TIMs, TAMs, DCs, and OCs for the expression of widely used canine (MSR1 aka CD204 and AIF1 aka Iba1) and human (CD163 and CD68) macrophage markers (Fig. 7a). With the caveat that this analysis is limited to transcript abundance and does not evaluate protein expression, we found that CD163 transcript expression was the most speci c for macrophages. CD68 expression was detected in TIMs, TAMs, and OCs, with a remarkably high expression levels in mature OCs. The expression of CD68 on mature OCs is consistent with human literature 45 . AIF1 (Iba1) was the most non-speci c marker with diffuse expression across all cell types, except for mature OCs. Lastly, CD204 (MSR1) was determined to be largely speci c to TAMs, but the expression also extended to CD320 + OCs and CD4 + monocytes. To investigate the translational relevance of this nding, we evaluated expression of the markers in human OS (Supplemental Fig. 6). We observed similar expression patterns, with marked variability in speci city of each marker, suggesting the variability is conserved across species.
Given the degree of heterogeneity within the myeloid compartment in the OS TME, we used a Wilcoxon Rank Sum test to identify features that de ne each cell type, then selected for features predicted to be expressed on the cell surface (Fig. 7b, Supplemental Fig. 7, Supplemental data 3). Overall, the analysis suggested there is substantial overlap in expression of most features. Despite the overlap, we were able to identify candidate markers which include ADAM28 for LA-TAM_C1QC hi , TNFSF13B for IFN-TAMs, and CD84 for mature OCs. Lastly, we calculated the relative percentages of each cell type to further facilitate cell identi cation (Fig. 7c, Supplemental table 3). Together, the data presented here act as a foundation to further investigate the role of myeloid cells in OS biology.

Cell-cell interaction analysis indicates TAMs are involved in immune regulatory pathways
Following cell identi cation through independent reclustering of major cell types, we evaluated the cellcell interaction networks using CellChat. Between the 41 cell types included in the analysis, we identi ed a total of 15,405 inferred interactions across 61 signaling networks. The number of interactions and the predicted interaction strength of incoming (express receptor) versus outgoing (express ligand) signals were used to infer the activity of cells within the TME (Fig. 8a, Supplemental Fig. 8a). The top three cell types predicted to have the strongest interactions were broblasts, mature OCs, and endothelial cells. We next categorized the signi cantly enriched networks as "immune speci c", "immune related", and "nonimmune" to investigate if certain cell types were more active in a subset of networks (Fig. 8b,  Supplemental table 4). We found that malignant osteoblasts and stromal cells were largely predicted to be involved in "non-immune" interactions, while "immune speci c" interactions were largely con ned to TAMs and DCs with strong outgoing interactions.
By subsetting on immune cells and evaluating interactions of known immune regulatory pathways we identi ed mregDCs and IFN-TAMs to have the most interactions, while activated (CD5L + ) macrophages and C1QC hi LA-TAMs were predicted to have the strongest outgoing signals (Fig. 8c). It was further predicted that follicular helper and regulatory CD4 T cells make up the populations receiving most of the signals originating from myeloid cells. When evaluating the PD-L1 network, we identi ed mregDCs, TIMs, and IFN-TAMs to have the highest expression of PD-L1 and were predicted to interact with T fh , T regs , and exhausted CD8 T cells (Fig. 8d, Supplemental Fig. 8b). The CD80 and CD86 networks involved a larger portion of myeloid cells, with all CD4 T cells predicted to be in uenced by the interactions (Fig. 8e, Supplemental Fig. 8c/d). Overall, activated TAMs, IFN-TAMs, and C1QC hi LA-TAMs are predicted to be key contributors to the suppression of T cell mediated immunity.
Comparison of human and canine scRNA-seq OS datasets reveal a high degree of similarity in cell type gene signatures between species Lastly, we obtained 6 publicly available treatment-naïve human OS scRNA-seq samples to complete a cross-species analysis (GSE162454) 12 . The two datasets were integrated using a SCTransform work ow which is reported to overcome genome annotation differences between species 46 . Hierarchical clustering of SCT normalized data revealed a high degree of similarities between species, with major clades containing similar cell types based on pre-integration annotations (Fig. 9a). All canine lymphocyte subtypes paired 1:1 with their human counterpart, as did endothelial cells and broblasts. Discrepancies between species included the placement of mast cells, which clustered into separate clades. Overall, macrophages clustered in the same clade, but due to differences in annotation levels, many cell types did not pair off into terminal clades.
To further compare transcriptional programs across species we used an analysis approach adopted from Scheyltjens et al 47 . Brie y, the approach used DGE analysis between two cell populations in each species, then signing the adjusted P value to determine if transcriptomic signatures were conserved. When contrasting broblasts and endothelial cells, we found substantial overlap in gene expression patterns with key endothelial cell markers (PLVAP, CD34, and PECAM1) enriched in both species (Fig. 9b, Supplemental data 4). Top features conserved in broblasts included VCAN, COL6A1, and LUM, while key features such as FAP and ACTA2 were also conserved. Interesting discrepancies included the expression of HYAL2 and NOTCH as de ning features in human endothelial cells, but nonsigni cant in canine endothelial cells.
Completion of the same analysis on plasmacytoid DCs and cDC2s revealed TCF4 to be enriched in pDCs and BATF expression enriched in cDC2s, which is consistent with human literature (Fig. 9c, Supplemental data 5) 28 . An intriguing distinction between species included the high expression of GZMB and PTGDS (prostaglandin D2 synthase) in human pDCs, but not in canine pDCs. Lastly, we applied the same approach to compare mature OCs with TIMs (Fig. 9d, Supplemental data 6). As expected, mature osteoclasts were de ned by CSTK, ACP5, and ATP6V0D2 expression, while monocytes in both species were de ned by CXCL8, OSM, and LYZ expression. Notable differences included canine monocytes exhibiting high expression of SLAMF9 and PLBD1, while human monocytes had high S100A8 and HCST expression. In summary, we present a comprehensive comparison of human and canine OS cell types, which suggests a high degree of consistency in cell type gene signatures across the two species, however we also present evidence of distinct transcriptional programs in pDCs, mast cells, and monocytes.

Discussion
In the present study, we completed a comprehensive analysis of canine osteosarcoma (OS) using singlecell RNA sequencing which revealed the complex network of cells within the tumor microenvironment (TME). Through analysis of 6 treatment-naïve canine OS samples we were able to identify 30 distinct immune cell types, 9 unique malignant osteoblast populations, 1 cluster of broblasts, and 1 population of endothelial cells (Supplemental data 7). We described the transcriptomic heterogeneity within malignant osteoblasts, identi ed cell types that have not been previously reported in dogs, and applied our data set to investigate the transcript abundance of widely used macrophage surface markers.
Ultimately, the data presented here act as a molecular roadmap of the canine OS tumor microenvironment and help to overcome the reagent and technical limitations associated with using the dog as a model.
Prior to this study, evidence of a conserved OS TME between humans and dogs has been limited. By obtaining a publicly available human OS dataset we were able to directly compare the cellular Our analysis revealed the presence of many rare cell populations, including mregDCs, CD4 fh T cells, and IFN-TAMs, opening avenues for further investigation of these cell populations using the transcriptomic signatures identi ed in this study. With the caveat that transcript expression may not correlate with protein expression, we used the surfaceome reference database to identify possible surface markers for further study of these cell types 26 . In addition to antibody-based assays, the transcriptomic signatures presented here provide a reference for the application of deconvolution algorithms (such as CIBERSORTx and TIMER) when evaluating bulk RNA sequencing data obtained from canine OS samples 51,52 .
Mature regulatory dendritic cells represent a recently de ned cell type which has been identi ed across several human tumor types, including OS 30,53 . The biological role of mregDCs is still being identi ed, but recent reports suggest a potential role in shaping T cell antitumor immune responses 23,24 . In our analysis, we were able to identify a CCR7 + /IL4I1 + /FSCN1 + dendritic cell population which closely resembles the descriptions of human mregDCs. We found that canine mregDCs express high levels of immune suppressive and costimulatory molecules, which may play a role in modulation of adaptive immune responses through communication with follicular helper and regulatory T cells. This study provides evidence that mregDCs are present in canine OS, demonstrating a conserved role within osteosarcoma across species.
The heterogeneity within the myeloid compartment of tumors is only beginning to be understood and the role of TAMs in OS clinical outcomes is debated 54 . To provide further context for the discrepant ndings reported in the literature, we evaluated the transcript abundances of key macrophage markers used in human and canine analysis. Although the analysis was completed at the transcript level, we observed notable differences in the speci city of each cell type marker within myeloid population. Inconsistences between cell types evaluated using immunohistochemistry could explain why some groups identify negative prognostic correlates while other groups report positive outcomes. Further validation of the variability in cell type markers should be completed using re ned immunohistochemistry panels coupled with spatial transcriptomics. Ultimately, a better understanding of the prognostic and functional roles of myeloid cells within the TME will aid in the development of effective targeted therapeutics.
While the single-cell RNA atlas presented here provides key insights into canine OS, the dataset is not without limitations. First, although we sampled male and female dogs across a range of ages, our dataset still only consisted of 6 dogs and may not fully represent all cell populations found in canine OS.
Secondly, the tumor sample obtained from one dog (dog 6) exhibited markedly more neutrophils relative to other samples which may suggest sample contamination with blood, bone marrow, or necrotic tissue. Lastly, cellular annotations largely relied on human gene signatures due to the lack of canine speci c data available. This may have exaggerated similarities between species and may have resulted in the forcing of distinct canine-speci c cell types into human nomenclature subtypes. Thus, the discordant ndings regarding mast cell gene signatures represents an important distinction that should be investigated further and considered when using the dog as a model for human disease.
The data presented here represent a valuable resource for comparative oncology research. A major goal of this project is to make the data accessible to the greater research community and multiple avenues are provided for researchers to explore and use the dataset (see data availability statement). Our comparisons between human and canine OS revealed the conserved nature of cell type gene signatures in OS while also identifying potential differences. Overall, our analysis supports the dog as a model for human OS and provides a novel reference dataset that can be used to increase the value of canine immuno-oncology research.

Study Animals
Dogs for the study were selected based on the presence of an appendicular primary tumor and the absence of previous therapeutic intervention. All dogs presented with radiographic evidence of OS and subsequent histopathology was completed to con rm the diagnosis. Dog demographics are presented in Table 1. All study dogs underwent amputation of the affected limb and samples were collected for singlecell RNA sequencing processing within 30 minutes. All studies were approved by the Colorado State

Library preparation and sequencing
Single cells were isolated and tagged with unique cell barcodes using a Chromium iX instrument with a target of 5,000 cells per sample. Two of the six dogs (dogs 1 and 2) had two samples processed each with a 5,000-cell target, for a total target of 10,000 cells. Single cells were isolated and processed using a Chromium Next GEM Single Cell 3 Kit v3.1 following manufacture recommended protocols. Individual cell transcriptomes were captured and labeled with molecular barcodes, then a standard Illumina library preparation was completed using a dual index library construction kit (10x Genomics). Samples quality was analyzed using a LabChip (PerkinElmer; Waltham, MA) and submitted for sequencing on an Illumina NovaSeq 6000 sequencer (Novogene Corporation; Sacramento, CA) with a target of 100,000 150 bp paired-end reads per cell. Raw data were demultiplexed by the sequencing core then transferred for downstream analysis.

Read mapping and quanti cation
A Cell Ranger analysis pipeline (version 6.1.2, 10x Genomics) was utilized to process raw FASTQ sequencing data, align reads to the canine genome, and generate a count matrix. The default settings were used when running "cellranger count" and aligned to a CanFam3.1 reference prepared as previously described 20 .

Data ltering and integration
For each sample, the count matrix was imported into R using the Read10X() function then converted to a Seurat object using the CreateSeuratObject() function 17 . To estimate the number of dead/poor quality cells, the percentage of mitochondrial reads per cell was calculated using PercentageFeatureSet() to count all reads mapped to features with the pre x "MT-". Each object was ltered to only retain cells which met the following requirements: 200 < nFeature_RNA < 5500, percent.mt < 12.5, and 100 < nCount_RNA < 75000. Next, DoubletFinder, was used to identify and remove putative cell doublets 55 . After completing QC ltering on each sample, all samples were integrated into one object using the SCTransform() and integration work ow 46 . During this step, we regressed out the percent mitochondrial reads to minimize the impacts on clustering results and used 2000 features as integration anchors. Following data integration, three low quality clusters were identi ed and removed then the samples were split out to repeat data integration. Ideal clustering parameters (res = 0.8, dims = 45, n.neighbors = 40, min.dist = 0.35) were determined using the R package clustree 56 . Dimension reduction and visualization was completed, and the data were projected using 2-dimensional, non-linear uniform manifold approximation and projection (UMAP) plots.

Cell classi cation
High level cell type annotations were established using unsupervised clustering results, gene set enrichment analysis, and manual annotation based on the literature for accepted human cell type markers 57 . Brie y, markers included CD3E for T cells, CTSK for osteoclasts, CD68 for macrophages, S100A12 for neutrophils, COL1A1/ALPL/FAP for tumor/ broblasts, FLT3 for dendritic cells, Differential gene expression analysis Differential gene expression (DGE) analysis was completed using pseudobulk conversion followed by a DESeq2 pipeline 58 . Prior to running DESeq2, low abundance features, de ned as features with less than 10 raw counts across all cells sampled, were ltered out. Features that had an adjusted P value of less than 0.05 (as determined using a Benjamini and Hochberg correction method) and a log2(fold change) greater than 0.58 were considered to be statistically signi cant.

Gene set enrichment analysis
When completing follow-up gene set enrichment analysis (GSEA) on the gene lists generated from DGE analysis, the signi cantly upregulated and downregulated features were processed separately. The upregulated and downregulated gene lists were used with clusterPro ler and MSigDB gene sets to infer pathway activity 59,60 . Terms which reached an adjusted P value of 0.05 or lower (Benjamini and Hochberg correction method) were discussed as signi cantly enriched.
In addition to using GSEA following DGE analysis we also used the R package singleseqgset to complete GSEA on cell type clusters. The tool uses a competitive gene set enrichment test that was based on a Correlation Adjusted MEan RAnk gene set test 61 . The log2(fold change) and mean expression for every feature within each cell type was calculated and used to complete GSEA. P values were corrected for multiple comparisons using a false discovery rate (FDR) method and corrected P values were ltered to only retain terms in which at least one cell types had a value less than 0.05. The enrichment values were scaled and the top pathways (weighted by P value) were plotted using a heatmap.

Copy number variation analysis
Copy number variation (CNV) analysis was completed using CopyKAT on high quality cells that contained more than 2000 unique molecular identi ers (UMIs) 14 . Brie y, the approach segments the genome into 220-kb variable genomic bins to establish a genome-wide copy number pro le for each single cell at an approximate resolution of 5 Mb. Eash sample was run individually with a known normal cell population consisting of osteoclasts, neutrophils, macrophages, and T cells when inferring CNV status. Individual cell classi cations were extracted from each .rds output le, then transferred to the Seurat object containing the integrated data. The CNV status was visualized on a UMAP. This approach was only used to infer if a cell was aneuploid or diploid and individual chromosome mutations were not evaluated due incompatibilities of software across species.

Regulon activity
Single-cell regulatory network inference and clustering (SCENIC) was used to infer activity of gene regulatory networks by cell types 62 . The two regulatory feather les used for analysis were obtained from https://resources.aertslab.org/cistarget/ and were named "hg19-500bp-upstream-7species.mc9nr.feather" and "hg19-tss-centered-10kb-10species.mc9nr.feather". Default settings were then applied to run the analysis pipeline. Regulons speci city scores (rss) were calculated using AUCell and the rss values were used to infer regulon activity in the cell types analyzed 63 .

Human OS homology analysis
Six treatment-naive human OS samples were obtained from the NCBI GEO database accession GSE162454 12 . The count matrices reported from the previous study were loaded in as Seurat objects and were ltered using the same parameters as used for the 6 canine OS tumor samples. The human dataset was annotated using high-level unsupervised clustering while referencing the primary article in an attempt to recreate the original annotations. Following annotation of each species, the 12 (6 human and 6 canine) OS samples were integrated into one object to using a SCTranscfrom work ow with 3000 variable features as anchors. Only features with homologues across both species were used for integration. SCT normalized counts were then used to complete hierarchical clustering using the hclust() function with method set to "complete". Subsequent DGE analysis contrasting cell types within each species was completed within individual species datasets. The adjusted P values obtained from DGE analysis were assigned a sign (+/-) based on the log2(fold change) then the signed P values were used to generate a scatter plot.
Data and software availability