Gene expression of Tasmanian devil facial tumors differs among host populations

Transmissible cancers lie at the intersection of oncology and infectious disease, two traditionally divergent elds for which gene expression studies are particularly useful for identifying the molecular basis of phenotypic variation. In oncology, transcriptomics studies, which characterize the expression of thousands of genes, have identied processes leading to heterogeneity in cancer phenotypes and individual prognoses. More generally, transcriptomics studies of infectious diseases characterize interactions between host, pathogen, and environment to better predict population-level outcomes. Tasmanian devils have been impacted dramatically by a transmissible cancer (devil facial tumor disease; DFTD) that has led to widespread population declines. Despite initial predictions of extinction, populations have persisted at low levels, due in part to heterogeneity in host responses, particularly between sexes. However, the processes underlying this variation remain unknown.


Results
We sequenced transcriptomes from healthy and DFTD-infected devils, as well as DFTD tumors, to characterize host responses to DFTD infection, identify differing host-tumor molecular interactions between sexes, and investigate the extent to which tumor gene expression varies among host populations. We found minimal variation in gene expression of devil lip tissues, either with respect to DFTD infection status or sex. However, 4,088 genes were differentially expressed in tumors among our sampling localities. Pathways that were up-or downregulated in DFTD tumors relative to normal tissues exhibited the same patterns of expression with greater intensity in tumors from localities that experienced DFTD for longer. No RNA sequence variants were associated with expression variation.

Conclusions
Expression variation among localities may underly phenotypic variation in the tumor, potentially manifesting in morphological differences that alter ratios of normal-to-tumor cells within biopsies.
Phenotypic variation in tumors may arise from environmental variation or differences in host immune response that were undetectable in lip biopsies, potentially re ecting variation in host-tumor coevolutionary relationships among sites that differ in the time since DFTD arrival.

Background
Identi cation of the processes underlying cancer development as well as those associated with heterogeneity in its progression is critical for predicting both individual and population-level outcomes [1,2]. Measuring relative levels of gene expression has been key for identifying genes and biological pathways associated with cancers and heterogeneity in cancer phenotypes [3][4][5]. For example, gene expression studies have shown regulatory dysfunctions associated with tumorigenesis [6], identi ed expression pro les associated with therapy resistance and poor prognoses [7][8][9], and provided insights into interactions between tumor cells and the immune system [10]. Similarly, gene expression analyses are frequently used to understand how organisms respond to environmental stressors, such as thermal stress [11], pollutants [12], and infections [13]. Variation in gene expression among individuals can also indicate how biotic and abiotic pressures underlie population-level responses. For example, temperaturedependent susceptibility of salamanders to Batrachochytrium dendrobatidis (Bd) is driven by temperature-mediated shifts in innate versus adaptive immune gene expression, enabling predictions of Bd effects on amphibian communities under different climate change scenarios. [14]. The elds of oncology and wildlife disease intersect in the case of transmissible cancers, which are clonal, transmissible tumors that are spread among individuals by the transfer of cancerous cells [15].  [15,16]. Whereas CTVT has relatively benign effects on its hosts both at the individual and population levels, DFTD presents as an aggressive, highly virulent pathogen with nearly 100% case fatality rate and devastating effects on Tasmanian devil populations [17][18][19]. Since its discovery in 1996, DFTD has spread throughout the entire geographic range of the Tasmanian devil, leading to rapid population declines exceeding 80% [20,21]. Developing initially as a Schwann (peripheral nerve) cell cancer in a female devil [22], mutations leading to downregulation of MHC class I expression in the tumor, coupled with potential natural killer cell dysfunction in devils, enabled DFTD to evade host allograft rejection and become transmissible [23][24][25]. DFTD is transmitted via biting, a fundamental behavior in devil social interactions [26,27], with DFTD tumors manifesting externally and predominantly around the mouth and face. Following visible presentation of tumor growth, progression is rapid, leading to death within 12 months.
Despite observations of frequency-dependent transmission leading to early predictions of devil extinction [18,28], recent epidemiological models incorporating individual variation in pathogen load suggest that extinction is an unlikely outcome [29,30]. Inter-individual variation in tumor growth rates and latency periods, together with a lack of vertical transmission, often enables females to survive through their rst breeding season, allowing populations to persist at low densities despite high DFTD prevalence [29].
Female responses contributing to this success include increased precocial breeding and fecundity in the rst breeding season in DFTD-affected populations, as well as DFTD tolerance that manifests in a slower loss of body condition in DFTD-infected females relative to males [31][32][33].
Recent genomic evidence suggests rapid evolutionary responses of devils to DFTD. Genomic regions putatively associated with immune response, cancer resistance, and behavior appear to be evolving within as few as four generations (approx. 8 years) under positive selection in DFTD-affected populations [34,35]. Females may also be adapting to DFTD through greater tolerance, with several associated genes of large effect detected in a genome-wide association study [36]. Although putative functions of candidate genes identi ed in these studies suggest biological functions that may underlie variation in host tness, mechanistic differences in response to infection between sexes remain unclear. Additionally, isolated cases of spontaneous tumor regression have been observed in the eld, the molecular underpinnings of which appear to be in regulatory regions because no non-synonymous substitutions have been found in either the devils or tumors [37,38]. Further, phenotypic responses in devil populations following DFTD arrival have been observed within one or two generations [33], suggesting existing plasticity and not a purely adaptive response.
Studies comparing transcriptomic and genomic variation are ideally suited for elucidating the molecular basis for variation in devil responses to DFTD. However, previous gene expression studies of DFTD have aimed to identify cell type of origin [22] or targeted speci c sets of immune-related genes, primarily in laboratory-cultured DFTD cell lines [e.g., [38][39][40][41]. Thus, there is a need to understand variation in both host and DFTD tumor gene expression in natural populations, particularly with respect to observed variation in DFTD-associated impacts on hosts.
To investigate the role of gene expression variation in manifestation of DFTD, we performed a population transcriptomics study of wild Tasmanian devils from DFTD-affected populations. We sequenced whole RNA from normal tissues in both healthy and DFTD-infected devils, as well as tissues from DFTD tumors, and tested predictions of differential gene expression with respect to sex and population that could explain individual-and population-level responses to disease. We predicted that 1) similar to other types of cancer, DFTD exhibits gene expression that is distinct from normal host tissues, 2) the severe disease associated with DFTD infection induces signi cant responses in infected hosts as evidenced by expression differentiation between infected and uninfected hosts, 3) previously observed differences in DFTD tolerance between host sexes will be re ected in inter-sex variation in gene expression, and 4) spatial genetic variation in both devils and DFTD produces variation in gene expression across geographic localities.

Results
Sequencing, alignment, and transcript assembly

Between devil lip and DFTD tissues
For the comparative analysis of differential gene expression between lip and tumor tissues, we retained a total of 14,807 expressed genes after ltering (Additional le 8: File S8). We found a dramatic difference in gene expression between lips and tumors (11,149 signi cant differentially expressed genes at a false discovery rate [FDR] of 0.05), which were clearly delineated as multidimensional scaling clusters ( Fig. 2a; Additional le 5: Figure S5). Because the number of differentially expressed genes for this contrast was so large, we applied a log2 fold change (log2FC) minimum threshold of ± 2 and decreased the FDR to 0.01 to focus our efforts on only the most highly differentially expressed genes. With this more stringent lter, we identi ed 4,234 genes that were still differentially expressed between tissue types, with 2,286 upregulated and 1,948 downregulated in DFTD tumors.
Using the PANTHER Overrepresentation Test [43], we identi ed numerous biological processes that were signi cantly enriched among genes that were highly differentially expressed between tumor and lip tissues (Additional le 6: Figure S6). Many of these gene ontological (GO) terms were clearly associated with cellular differentiation due to the epidermal and skeletal muscle tissue present in lips as opposed to the Schwann cell (a peripheral nerve cell) origin of DFTD. For example, genes upregulated in tumors were enriched for nervous system processes and extracellular matrix organization but depleted in DNA damage response, transcription, and protein metabolism. Genes downregulated in tumors were enriched for muscle cell development and function but depleted in double-strand break repair, negative cell cycle regulation, transcription, and translation. Full lists of signi cantly over-or underrepresented GO-terms are provided in Additional le 9: File S9.
Gene Set Enrichment Analysis revealed 16 positively and 163 negatively enriched Reactome pathways in DFTD at FDR = 0.01 (Fig. 3). Summarization via EnrichmentMap clustering highlighted a variety of generalized differences between tumor and lip tissues. The largest pathway gene set cluster contained 95 pathways and was annotated "proteasome degradation signaling" (Fig. 3). This cluster contained four pathways associated with the mitotic cell cycle that were positively enriched in DFTD. The remaining pathways in this cluster were negatively enriched in DFTD, including many that were associated with the immune system and various signaling pathways such as Notch, slits and robos, and DNA damage checkpoints (both dependent and independent of P53). A closely related cluster annotated as "DNA repair HDR", contained pathways associated with homology-directed DNA repair that were positively enriched in DFTD, as well as transcriptional regulation by TP53 that was negatively enriched in DFTD (Fig. 3).
Clusters entirely positively enriched in DFTD were associated with collagen formation and extracellular matrix organization, cilium formation, and neuron function ( Fig. 3; see Additional le 10: File S10 for full list of enriched gene sets and annotated gene set clusters).

Among devil lip tissues
For comparison among lip tissues only, we retained a total of 14,165 expressed genes (Additional le 8: File S8). Overall, we found little evidence of differential gene expression between lips from devils of different sex or based on infection status, irrespective of sampling locality. No genes were differentially expressed in any of our contrasts between DFTD-infected devils and those from clinically healthy devils, irrespective of host sex (Fig. 2b). However, when contrasting host sexes directly regardless of infection status, we identi ed seven differentially expressed genes. Speci cally, we observed signi cant upregulation of FRMD7, HMGB3, MECP2, and three uncharacterized novel genes in females, and upregulation of one uncharacterized novel gene in males. No additional genes differentially expressed between males and females were identi ed when considering DFTD-infected and uninfected devils separately. Both groups exhibited signi cant up-regulation in females of FRMD7, HMGB3, and two uncharacterized novel genes, whereas MECP2 was upregulated in females only among uninfected individuals. Among localities, we identi ed 1,564 genes differentially expressed between TKN and BR (897 up and 667 downregulated in TKN) but no evidence of differential expression in lips between any other pair of localities. Genes differentially expressed between TKN and BR were associated with immune system processes and cellular developmental processes (upregulated in BR), as well as lipid metabolism (upregulated in TKN) (Additional le 9: File S9). Six gene sets were positively enriched in BR relative to TKN and were associated with phagocytosis and T-cell activation (Additional le 10: File S10). For all other lip-only contrasts, there were insu cient numbers of differentially expressed genes for GO-term enrichment analysis. Similarly, no signi cantly enriched pathway gene sets (FDR = 0.12-1.0) were identi ed for any other lip-only statistical contrasts.

Among tumor tissues
In DFTD tumors, we retained 14,204 expressed genes following ltering (Additional le 8: File S8). Genes expressed in DFTD but not lip tissues, and vice-versa, were almost entirely linked to functions associated with the cell type of origin (i.e., Schwann vs muscular/epidermal cells). When contrasting tumors infecting devils of opposite sex, we found no genes that were signi cantly differentially expressed between sexes across all localities, but we observed some locality-speci c differences. Although no genes were differentially expressed between tumors from different host sexes in BR, 17 genes were differentially expressed in tumors between sexes in TKN, all of which were upregulated in tumors from male devils relative to females: SYNE2, CLUH, PARS2, VPS13A, KIAA1586, TASOR2, SLC12A5, TRRAP, DIP2A, BRPF1, PCNX3, and six uncharacterized novel genes. In WPP, only MRPL53 was differentially expressed -upregulated in tumors infecting female devils.
Direct contrasts between localities -irrespective of host sex -revealed substantial differences in tumor gene expression. In general, the greatest difference was between BR and WPP, with TKN being somewhat intermediate of the two other localities (Fig. 2c, Additional le 7: Figure S7). In all, 3,825 genes were differentially expressed between BR and WPP, of which 1,635 were upregulated and 2,190 were downregulated in BR compared to WPP. In BR tumors relative to TKN, 414 genes were upregulated, and 548 genes were downregulated. In WPP tumors relative to TKN, 823 genes were upregulated, and 416 genes were downregulated.
All three pairwise contrasts of tumors between localities were signi cantly enriched for biological processes. All processes overrepresented among genes upregulated in BR relative to TKN were also overrepresented among genes upregulated in BR relative to WPP -these were broadly associated with translation and protein synthesis. Additional processes overrepresented among genes upregulated in BR relative to WPP were predominantly associated with other metabolic and biosynthetic processes. Genes downregulated in BR relative to WPP were disproportionately associated with chromosome organization and gene expression. Similarly, in TKN tumors relative to WPP, downregulated genes were also associated with chromosome organization and gene expression regulation, whereas upregulated genes were associated with transcription, translation, and various metabolic and biosynthetic processes.
We identi ed signi cantly enriched Reactome pathway gene sets in comparisons between DFTD tumors sampled from males and females in TKN, as well as in comparisons among tumors from different localities generally. Overall directionality of gene set enrichment indicates a pronounced difference between BR and WPP, with TKN intermediate of the two but generally more similar to BR. Interestingly, the pathway gene set clusters positively enriched among tumors from WPP were characteristic of those positively enriched in DFTD compared to devil lip tissue. We identi ed ve clusters of pathway gene sets that were signi cantly enriched in all three pairwise contrasts between localities (Fig. 4). The largest of these clusters -"regulation mediated degradation" -contained 119 pathways and was approximately equivalent to the largest cluster identi ed in the DFTD-lip contrast above. Accordingly, it contained pathways predominantly associated with cell cycle checkpoints and programmed cell death, and the innate and adaptive immune system that were all signi cantly negatively enriched in WPP tumors compared to those from TKN and BR. A small number of pathways in this cluster associated with the mitotic cell cycle were positively enriched ( Fig. 4; see Additional le 10: File S10 for full list of enriched gene sets and annotated gene set clusters).
Of the three between-locality tumor contrasts, TKN and BR tumors appeared to be the most distinct, with four pathway gene set clusters that were signi cantly enriched only in this comparison, while eight clusters were enriched only in the other two between-locality contrasts (Fig. 4). Clusters unique to the TKN-BR contrast included many pathways associated with the mitotic cell cycle (Fig. 4). Clusters that were lacking only from the TKN-BR contrast included pathways associated with protein folding and transport, metabolism, negative regulation of transcription, and autophagy (Fig. 4).
In general, we observed no enrichment of pathways among tumors sampled from devils of different sexes, either across all localities or within localities individually, with the exception of TKN. As re ected in Fig. 2c, gene set enrichment between tumors sampled from different sexes from TKN generally mirrored enrichment between WPP and BR, with TKN males negatively enriched for pathways that were also negatively enriched in WPP when compared to BR. Overall, at FDR = 0.05, there were four pathways signi cantly positively enriched and 21 negatively enriched in TKN males compared to females. Pathways positively enriched in TKN males were associated with chromatin organization and mitotic prometaphase. Pathways negatively enriched in TKN males were associated with eukaryotic and mitochondrial translation and protein synthesis, nonsense-mediated decay, and respiratory electron transport.

Genotypic variation and associations with gene expression
To identify genotypic variation potentially associated with variation in gene expression, we genotyped indels and single nucleotide polymorphisms (SNPs) from the transcriptome sequences. Following ltering (see Additional le 3: Text S3), we retained 9,559 SNPs and 6,013 indels in devils, and 473 SNPs and 1,554 indels in tumors for analysis. For both tissue types, we found evidence for weak genetic structure when specifying sample localities a priori, though this structure was more pronounced in devils than in tumors (Fig. 5). However, when identifying genetic clusters purely from SNP variation, only one cluster each was supported for both devils and tumors.
We used SnpEff [44]  However, after FDR adjustment, we found no signi cant associations between any variant on the expression of the genes it was predicted by SnpEff to effect.

Discussion
As predicted, substantial differentiation exists in gene expression between lip and tumor tissues. We observed differential expression patterns consistent with DFTD's origin as a Schwann cell tumor [22], including genes involved in nervous system functions such as synaptic signaling and ion transport, as well as characteristics common to cancers [5]. Lips, in contrast, had differentially expressed genes characteristic of muscle and epidermal tissue function. Within hosts, we found surprisingly little transcriptional variation in between the sexes, regardless of infection status, despite clear sex-biased differences in resistance and tolerance of DFTD [31,36]. Perhaps most interestingly, we found signi cant variation in gene expression among DFTD tumors collected from different localities, potentially associated with different tumor strains, environmental effects (either within-host or extrahost) or hostpathogen coevolutionary relationships. However, we found no association between detected SNP variants in transcribed genomic regions and gene expression, consistent with results from previous studies that showed variants in regulatory regions [37], which are not captured in RNA-seq approaches, likely affect observed phenotypic variation.
Gene expression characteristics of DFTD Despite considerable genotypic and phenotypic diversity among cancers [47], there are relatively common mutational drivers that are necessary for the genesis and proliferation of cancerous cells. Functions of these drivers include: the sustenance of proliferative signaling, evasion of growth suppressors, resistance to apoptosis, induction of angiogenesis, activation of invasion and metastasis, reprogramming of energy metabolism, and evasion of immune response [48,49]. Similarly, we observed differential expression of genes and enrichment of pathways in DFTD associated with the extracellular matrix, the cell cycle, DNA replication and repair, and immune function. The extracellular matrix is fundamental to maintaining tissue homeostasis. Dysregulation of the extracellular matrix can both cause -and occur as -a response to the development and proliferation of cancerous cells, facilitating uncontrolled proliferation, angiogenesis, tissue migration and invasion, and metastasis [50][51][52]. Cell cycle 'checkpoints' detect DNA damage or errors in DNA replication and chromosome organization and are regulated by critical tumor suppressor genes that prevent proliferation of defective and potentially cancerous cells [53]. A key example of such a gene is TP53, which is activated in response to DNA damage, initiates the inhibition of cell proliferation and regulates apoptosis [54]. Loss of TP53 function is common to many cancers [54,55]. We found various pathways associated with TP53, such as G1 and G2 cell cycle checkpoints, to be downregulated in DFTD relative to normal lip tissue. In turn, upregulation of pathways associated with mitosis and the cell cycle that likely lead to increased cell proliferation were also found in DFTD.
Despite downregulation of cell cycle checkpoint and apoptosis pathways, we observed upregulation of homology-directed DNA repair (HDR) in DFTD. Dysfunctional HDR often results in genomic instability and can lead to carcinogenesis [56,57]; however, cancers can become resistant to standard cancer treatments aimed at inducing DNA damage through reactivation of HDR, emphasizing that cancer cell proliferation is not dependent on HDR suppression [58]. Active yet dysfunctional HDR can lead to errors in doublestranded break repair that produce gross chromosomal rearrangements [59], such as those that are evident in DFTD [21,60,61].
Although all cancers exhibit some form of immune avoidance, DFTD and other transmissible tumors are unique in that they can evade host MHC recognition of non-self-cells [24]. In DFTD, MHC avoidance is achieved through the ERBB-STAT3 axis [41]. Speci cally, DFTD became transmissible when a normal Schwann cell tumor gained a variant exhibiting overexpression of ERBB3, which overactivates the transcription factor STAT3 and blocks the production of MHC I [23,24,41]. Without MHC I expression, DFTD cells are unable to be recognized as foreign by the host. Similar to previous work, we found the differential expression of other genes associated with STAT3, including upregulation of MMP2, HDAC5, and downregulation of PTGIS [41]. However, in contrast to this previous study that found upregulation of the TRIM28 protein, which is typically activated in response to STAT3 signaling, we detected slight downregulation in expression of this gene. TRIM28 is often overexpressed in cancer [62]; however, its expression has been shown to be predictive of tumor class in human glioblastomas [63], and positively correlated with tumor size and development stage, whilst being negatively correlated with patient survival in human hepatocellular carcinoma [64]. Lack of detectable TRIM28 overexpression may thus re ect size or developmental stage variation among the sampled DFTD tumors.
We also observed downregulation of Notch signaling, which is critical to Schwann cell development and required for differentiation of Schwann cell precursors (SCP). Notch activation upregulates ERBB2, which acts as a receptor for NRG1, which in turn is critical for transitioning SCPs to immature Schwann cells [65,66]. In the absence of Notch signaling, a scarcity of ERBB2 receptors would lead to reduced sensitivity of SCPs to NRG1, despite its overexpression. Dysfunction of these pathways suggests a decoupling of DFTD cell proliferation from typical Schwann cell developmental controls.

Gene expression in devil lip tissue is not associated with infection status or sex
DFTD infection has no effect on gene expression in devil lip tissue. This is surprising as DFTD has signi cant effects on host physiology and can elicit host immune response [67]. Tumor growth leads to increasing metabolic demands concurrent with di culty feeding, ulcerations, metastases, and secondary infections that produce a progressive loss of body condition and almost universal mortality within 12 months following visible tumor development [18,31]. In humans, gene expression in normal tissues adjacent to tumors re ects a state that is intermediate between healthy and cancerous, with commonly expressed pathways including pro-in ammatory responses induced by the tumor [68]. We chose lip tissue due to its proximity to the mouth, where DFTD allografts typically implant, believing that changes in gene expression would be greatest in tissues local to tumors. However, DFTD tumors do not always occur on the lips or in the mouth, and there was likely variation in the proximity of lip biopsies to the site of tumor growth. Ethical and experimental concerns necessitated a consistent sampling strategy for healthy tissues, preventing individual adjustments of biopsy locations to accommodate variation in tumor location. In addition, lip tissues may have been inappropriate for detecting systemic immunological or metabolic changes. For example, systemic immune responses associated with tumor growth tend to involve the accumulation of immune cells in the peripheral blood or lymphoid tissues, rather than in the epidermis [69]. Biological functions associated with immune response were not underrepresented in lip tissues, suggesting that the lack of differential expression was not speci cally due to a lack of immune expression in these tissues.
Despite a lack of a direct immune response in devil lips to DFTD infection, we observed upregulation of immune-associated genes in BR relative to TKN devils, irrespective of infection status. This difference is not necessarily associated with DFTD and may be due to differential exposure to other infectious agents or innate differences in immune function between genetically distinct populations. However, DFTD is an overwhelmingly strong selective force in devils [34,36,70] and likely drives immune adaptation in affected populations. Further, documented immune responses to DFTD [67] suggest that differential host immune expression between localities may alter the microenvironment faced by invading tumor cells. We thus recommend further gene expression studies targeting immunologically active host cells; speci cally, to investigate systemic host responses to DFTD using blood samples as well as a re ned approach for detecting localized responses by targeting healthy tissues < 1 cm from DFTD tumors.
Male and female devils exhibit different responses to DFTD, with females losing body condition at a signi cantly slower rate when infected and genetic evidence of adaptations that result in greater survival rates among females [31,36]. Although DFTD infection produced no transcriptomic response in lip tissues for either sex, we found several genes that were differentially expressed between males and females generally, regardless of infection status. Consistent with previous work comparing sexes in uninfected devils [71], the X-linked gene FRMD7 was downregulated in males (log2FC = -4.01). FRMD7 is putatively involved in fatty acid metabolism and has been associated with skin disorders, serving as a potential factor in differential susceptibility between sexes to DFTD [71]. Additionally, we found six other genes that were differentially expressed between sexes. Four of these were uncharacterized, while HMGB3 and MECP2 -both also X-linked -were downregulated in males. HMGB3 is a DNA-binding protein that helps maintain stem cell populations and is overexpressed in some human cancers via the Wnt signaling pathway [72,73]. Further, HMGB3 affects nucleic acid recognition and innate immune system activation, and its upregulation has been associated with allograft rejection [74][75][76][77]. Therefore, higher baseline expression of HMGB3 in female devils may enhance the innate immune response to DFTD relative to males. Recently, MECP2 was identi ed as an oncogene through induction of the MAPK and PI3K growth factor signaling pathways and is overexpressed in many human cancers [78]. However, it is unclear how MECP2 expression in normal host tissues could affect DFTD progression.

DFTD tumor gene expression varies geographically
We observed considerable variation in tumor gene expression among the sampled localities, which varied in the length of time since DFTD arrival. Interestingly, gene expression patterns that we observed in DFTD relative to host lips were more intense (i.e., more differentially expressed genes and greater log2 foldchanges) in WPP tumors -where DFTD has been present longest among our study sites -than in other populations. Speci cally, WPP tumors exhibited upregulation of mitosis and downregulation of translation, DNA damage checkpoints, and immune function relative to tumors from BR, which had the shortest time since DFTD arrival. Varying intensity of DFTD-characteristic gene expression may be due to differences in the ratios of normal-to-tumor cells within tumor biopsies, potentially driven by subtle differences in tumor morphology among localities (e.g., differences in the extent to which tumor tissue is delineated from the surrounding host tissue, or the distinctness of tumor margins). On the other hand, expression changes in cancer-associated genes are not only linked to tumorigenesis but can be directly correlated with tumor aggressiveness and overall prognoses in human cancer patients [79,80].
Meanwhile, incrementally greater gene expression changes through time can produce progressively more severe phenotypes or re ect more advanced stages of tumor development [e.g., 81,82]. The strength of expression in DFTD-associated pathways therefore may be associated with DFTD phenotypic variation, such as growth rates, although no such differences (nor differences in tumor morphology) between the studied localities have been documented.
Gene expression in tumors from TKN re ected an intermediate state between WPP and BR but were differentiated by sex ( Fig. 2c; Additional le 7: Figure S7). That is, tumors from TKN males exhibited patterns of gene expression that were characteristic of WPP tumors from both sexes and tumors from TKN females resembled male and female tumors from BR (see Fig. 2c). Perhaps coincidentally, previous work demonstrating more rapid body condition loss in DFTD-infected males than in females was conducted in TKN [31], while no similar analysis of host body condition has been performed for either of our other study sites. We do not have su cient serial volume measurements for the tumors in our study to directly associate gene expression with tumor growth rates. In the absence of data comparing tumor growth rates or host body condition between TKN, WPP, and BR, it is di cult to establish a link between relative levels of DFTD-characteristic gene expression the severity of disease. Additionally, multiple DFTD lineages with differing degrees of pathogenicity have been observed.
Speci cally, in WPP, DFTD arrival in 2006 was characterized by initially low mortality rates compared to localities that had experienced DFTD for longer [83]. Karyotype analysis con rmed the existence of a distinct tetraploid strain at WPP that was replaced by a more virulent diploid strain in 2012, causing an immediate increase in disease prevalence and population decline [84]. In addition, recent phylodynamic analysis indicates multiple tumor lineages exhibiting differences in transmission rates, demonstrating epidemiological variation among distinct tumor strains [85]. Given its recent emergence, our BR tumor samples may represent a novel DFTD lineage present near the advancing disease front. Such a lineage was not observed at a broad spatiotemporal scale [85] but may be evident through more intensive sampling of the most recently emerged tumors near and on the west coast of Tasmania.
We detected weak genetic structure in tumors that broadly re ected transcriptomic variation among localities and may re ect different tumor strains. However, we acknowledge that the persistence of a small number of host variants in the tumor dataset may produce a residual signal of host population structure, despite rigorous bioinformatic ltering to exclude known devil variants genotyped from lip RNAseq data as well high-coverage whole genome sequences (see Additional le 3: Text S3). Further, although we identi ed a number of variants in tumors that were predicted to affect the function of genes differentially expressed among localities, no signi cant associations were detected. Thus, transcriptional variation in tumors may be purely regulatory in origin, or we may have lacked su cient statistical power (19 samples) to identify genotype-expression associations. Alternatively, population genetic structure and adaptive and plastic responses to the local environment in hosts can produce variation in immune function that drive differential responses to wildlife disease [14,86]. Our study sites have similar vegetation communities and experience similar climatic conditions but decrease in elevation from east to west. Region-speci c adaptation of devils to local environmental conditions exists, as well as signi cant selection imposed by DFTD following its arrival in naïve populations [34,36,70]. Different transcriptomic responses of DFTD to devils from different localities may thus also be explained by immunological variation among devil populations that is driven by environmental differences that lead to differential exposures to non-DFTD infectious agents.

Conclusions
Characterization of DFTD gene expression and our broad analysis of transcriptomic variation in devils and DFTD points to a potential link between the relative degrees of DFTD-characteristic gene expression patterns and host population. Whether this variation is due to distinct tumor strains with differing phenotypes or by locally speci c differences in host tolerance/resistance remains unclear. Nonetheless, geographic variation in gene expression among tumors highlights the potential for ongoing devil-tumor evolution. DFTD arrived at each of the populations in this study at different times, and thus the coevolutionary processes that in uence the nature of host-tumor molecular interactions may be at different stages for each. Alternatively, or perhaps concurrently, environmental variation among localities may also be in uencing spatial variation in tumor gene expression. To separate these alternative explanations would require replicated sampling of time points with respect to DFTD arrival across different localities. Given the well-documented east-to-west spread of DFTD over a 25-year period, this system is highly suited to exploration of host-pathogen coevolutionary relationships at various times since disease arrival.

Sample collection
Lip and tumor tissues were collected as 3 mm biopsies from live, wild Tasmanian devils between 2016-2019 from three locations in northern Tasmania, Australia: Black River (BR), Takone (TKN), and West Pencil Pine (WPP) ( Fig. 1; Additional le 1: Table S1). Detailed descriptions of ethically-approved trapping protocols can be found in Hawkins et al. [19] and Hamede et al [84]. All research involving use of animals and eld activity was performed under University of Tasmania ethics approval A13326 and WSU IACUC approval ASAF 6796. Normal (i.e., noncancerous) tissue from inside of the lip was chosen due to its proximity to where DFTD tumor allografts typically implant, increasing the likelihood of detecting localized tissue responses in DFTD-infected animals, while ensuring both animal and handler safety.
Where individuals had multiple tumors, biopsies were taken from the largest tumor. For each population, lip biopsies from at least 3 DFTD-infected and 3 uninfected devils of each sex were collected, along with corresponding tumor biopsies from every infected devil. All biopsies were immediately preserved in RNAlater, kept at -20℃ for up to two weeks while in the eld, and subsequently stored at -80℃ until RNA extraction.

RNA extraction, library preparation, and sequencing
We

Sequence alignment and transcript assembly
We conducted initial quality screening of raw sequencing reads using FastQC and assessed study-wide sequencing quality using MultiQC. To trim and lter reads for quality, we used the TrimGalore wrapper with relatively relaxed settings due to the documented negative effects of stringent ltering on RNAseq analyses [87]. Speci cally, we trimmed adapter sequences and ends with a Phred quality score < 10 and removed reads < 30 bp long.
We performed sequence alignments and assembled transcripts according to protocols recommended by Pertea et al. [88]. Using Hisat2 v 2.1.0 [89], we aligned the trimmed reads to the published Tasmanian devil reference genome (Murchison 2012), specifying the --dta ag to produce alignments suitable for transcript assembly. We used samtools to sort the alignments prior to assembling the transcripts using Stringtie v 2.1.0 [90] and the Ensembl reference annotation v 7.0.97 [91]. Then, using Stringtie, we merged all transcripts into a non-redundant assembly and used gffcompare v 0.11.7 [92] to annotate the merged assembly and examine how the assembled transcripts compare with the reference annotation. We then repeated the transcript assembly for each sample, using the merged annotation as the reference, and generated tables of transcript abundances including only transcripts that appeared in the merged reference annotation.

Differential expression analysis
Prior to differential expression analysis, tables of transcript abundances were converted into read tables of gene-level abundances using the R package tximport [93]. All differential expression analyses and preparatory steps were conducted in R using EdgeR [94] unless otherwise indicated, with the entire pipeline run three times as distinct analysis sets: 1) tumor-lip contrasts to investigate differential expression in tumors compared to host lip tissues; 2) lip-only contrasts to investigate variation in gene expression among hosts (e.g., with respect to sex and/or DFTD infection status); 3) tumor-only contrasts to investigate variation in expression among tumors (e.g., with respect to host sex or locality). For the tumor-lip contrasts, all lip and tumor transcriptomes were analyzed together, while the lip-only contrasts only included lip transcriptomes and the tumor-only contrasts only included tumor transcriptomes.
To ensure our analysis contained only genes that were expressed across experimental groups, we removed genes lacking transcript counts > 10 in at least three samples. Read counts were then normalized among libraries using the calcNormFactors function. We included the following factors in our analysis: sex (of the host devil, in the case of tumors), infected (i.e., with or without DFTD), tissue (normal lip or DFTD tumor), population (sampling locality; BR, TKN, or WPP), and batch (batches 1 or 2 comprising 2016 samples, or batch 3 comprising 2018-2019 samples). Differential expression analyses were performed using a quasilikelihood generalized linear model framework [95], using a different model formula depending on the tissues being analyzed. To simplify the design for each model and more easily facilitate comparisons between groups of potentially interacting factors, two factors of interest were combined into a single group factor. For the lip-tumor and lip-only models, this group factor comprised sex and infected (female-uninfected, female-infected, male-uninfected, or male-infected). For the tumoronly model, the group factor comprised sex and population (female-BR, female-TKN, female-WPP, male-BR, male-TKN, or male-WPP). For all models, to account for any effect that differences in lab protocols (see above) may have had on gene expression measurements, we included batch as a factor. Accordingly, we speci ed the formula for the host lip tissue vs DFTD tumor tissue model as: For each model, we calculated log gene counts per million (logCPM) and constructed multidimensional scaling (MDS) plots using the R package limma [96] to identify clustering of samples according to our factors of interest. We observed MDS clustering of samples with respect to sequencing batch; this effect was subsequently corrected in all data visualization using the limma function removeBatchEffect to remove any variation in logCPM associated with the batch factor (Additional le 4: Figure S4). MDS plots represented the top 500 differentially expressed genes de ned as those with the largest standard deviations between all samples.

Enrichment analysis
For contrasts with large numbers of differentially expressed genes, we performed a PANTHER Overrepresentation Test to test for biological processes (as de ned by the PANTHER GO-Slim Biological Process annotation dataset) that are overrepresented among sets of differentially expressed genes. This comprises a Fisher's Exact Test for GO-terms that are over-or underrepresented relative to a 'background' reference list, which we speci ed as the full list of expressed genes from which the differentially expressed genes were detected. Signi cantly enriched GO-terms were identi ed at an FDR threshold of 0.05. We performed overrepresentation tests on sets of signi cantly over-and under-expressed genes separately as this has been shown to be more powerful for detecting differentially expressed pathways than analyzing all differentially expressed genes together [98]. To reduce redundancy among signi cantly enriched GO-terms and thus reduce the size and increase interpretability of our GO-term lists, we used REVIGO [99] to cluster highly related GO-terms together and identify single representative GO-terms for each cluster.
To test for general patterns of up-or down-regulation of biological pathways, we performed Gene Set Enrichment Analyses [GSEA; 100, 101] using pre-ranked lists of genes from differential expression analyses. Each expressed gene, irrespective of whether it was signi cantly differentially expressed, was ranked using a metric that measures the magnitude of the gene's log2FC as well as the nominal statistical signi cance of that change, calculated as . This produced a list whereby, for a given comparison between conditions, signi cantly upregulated genes had higher positive scores, signi cantly downregulated genes had lower negative scores, and genes that were neither up-nor downregulated had scores close to zero. Given such a ranked list, GSEA tests for known sets of related genes that have disproportionately higher or lower rankings, calculating an enrichment score (ES) and FDR-adjusted p-value for each gene set tested. We performed the analysis using GSEA v4.0.3 [100,101], testing gene sets from the most recent Reactome pathways database [102]. All GSEA contrasts were performed across 1,000 permutations and restricted to gene sets containing between 15-5000 genes, with the 'meandiv' normalization mode selected. For contrasts with relatively few (< 50) differentially expressed genes, we used the weighted enrichment statistic, which places greater emphasis on genes at the top and bottom of the ranked list. For contrasts with larger numbers of differentially expressed genes (thus having important genes ranked further down the list), we used the classic enrichment statistic, which weights rank positions equally.
To visualize and aid interpretation of our GSEA results, we produced similarity networks that cluster enriched gene sets according to numbers of overlapping genes and annotated each cluster of gene sets by common functions using the EnrichmentMap [103] and AutoAnnotate [104] add-ons to Cytoscape v 3.8.0 [105]. This accounts for redundancy between gene sets, reduces the overall complexity of the GSEA results, and facilitates identi cation of both major functional themes as well as unique or distinct pathways from among potentially hundreds of signi cant yet largely redundant gene sets. Gene sets were ltered to exclude those above an FDR q-value cutoff of 0.01. Gene set clusters were de ned using the Markov Cluster (MCL) algorithm with default settings. For each cluster, the top three most frequently occurring words among gene set names -normalized according to their frequency across all gene setswere used to concisely annotate each cluster for generalized functions. We made minor manual edits to annotations where automatic annotation produced misleading results.

Genotype analysis
We performed genotyping of RNA sequences according to the Broad Institute's (Cambridge, MA, USA) Best Practices Work ow for RNAseq short variant discovery. Detailed descriptions of the genotyping and variant ltration work ow are provided in Additional le 3: Text S3. To characterize population genetic structure in both devils and DFTD tumors that may be associated with among-population variation in tumor gene expression, we performed separate discriminant analyses of principal components (DAPC), implemented in the R package adegenet, using the respective SNP datasets. We de ned groups according to sample locality to characterize the extent of genetic differentiation among them; however, we also used the nd.clusters function to infer genetic clusters purely from SNP variation, without a priori designation.
To identify speci c SNPs and indels potentially associated with differentially expressed genes, we used SnpEff [44] to annotate all genotyped tumor variants according to their positions with respect to known genes (using the Ensembl v 7.0.86 annotation for S. harrisii) and quantifying each variant's predicted impact on gene function. For example, a variant leading to loss or gain of a stop codon would have a predicted high impact, a nonsynonymous variant would be predicted to have a moderate impact, and a synonymous variant would be predicted to have a low impact [44]. We evaluated these annotations against our lists of differentially expressed genes to identify variants potentially contributing to or associated with up-or downregulation of genes. We tested all variants predicted to affect any gene that was differentially expressed between any pair of localities for their effects on expression of their respective genes (measured as counts per million and normalized as above) using a Kruskal-Wallis test.
P-values were adjusted for multiple comparisons using the Benjamini-Hochberg approach, with FDR threshold of 0.05.