Differential gene expression analysis of the whole blood transcriptome between young and old companion border collie dogs

doi:10.21203/rs.3.rs-1715073/v1

Download PDF

Research

Differential gene expression analysis of the whole blood transcriptome between young and old companion border collie dogs

https://doi.org/10.21203/rs.3.rs-1715073/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Aging is the most significant risk factor in many diseases and for mortality alike, and it is known to be influenced by both genetic and environmental factors. Due to dogs’ importance in human societies, the study of aging in companion dogs is worthwhile in its own right. Still, dogs could also be ideal translational model animals for human aging research. In this study, our primary goal was to investigate the gene expression changes with age in whole blood samples of dogs. The advantage of using blood is its accessibility and known biomarkers for diseases.

Methods

We sequenced the poly(A)-tailed RNA fraction in the blood samples of five young and five old companion border collie dogs and implemented a differential gene expression analysis. We removed haemoglobin related reads in silico, following standard raw data quality control and alignment to the reference genome. Sample collection and preparation, sequencing and data analysis were implemented using standard laboratory methods and RNA-seq data analysis pipelines.

Results

The estimated statistical power of the study was 98% according to an initial power analysis with a 10% false discovery rate and minimum fold change of 3. Raw sequence data quality was high with an 86% alignment rate to the CanFam 3.1 dog reference genome. Following in silico haemoglobin read removal, 23-74 million reads remained for the differential gene expression analysis. In contrast to our expectations, we could not differentiate age clusters in either a multidimensional scaling analysis or in a principal component analysis. We found only a limited number of differentially expressed genes (n=61) between young and old dogs’ blood transcriptome. The only significant gene ontology term was response to bacterium with a fold change of 8.62.

Conclusions

In contrast with the large number of differentially expressed genes in the prefrontal cortical brain region of dogs, we could identify only one blood biomarker for aging in blood. Confounding factors (e.g., diet) might have influenced our results. Future research could also use biological age instead of the animals’ chronological age.

RNA-sequencing

dog

aging

blood

transcriptome

Aging is one of the most significant risk factors for illnesses and mortality [1]. A wide range of age-related changes (e.g., hearing loss) and neurodegenerative disorders (e.g., Alzheimer’s disease) severely impact the quality of life of older people. Aging is influenced by environmental and genetic factors [2], but the gene-by-environment interaction also play an essential role in aging-related phenotypes [3]. A model organism with a much shorter lifespan than humans can facilitate the study of the environmental and genetic factors of aging.

In recent years, several authors have recommended the dog as a model for human aging (e.g., [4]; [5]) and to age-related diseases, e.g., to model Alzheimer’s disease [6]. Indeed, many characteristics of the domestic dog make it an ideal model species for aging: companion dogs share a very similar environment with their owners, and they also naturally develop some age-related diseases that are analogous to human diseases (e.g., [7]). Pet dogs also have a considerably shorter lifespan than their owners, making longitudinal studies easier. For a detailed review on the relevance of dogs in the study of the genetic background of human aging, see Sándor and Kubinyi [8].

In our previous study, we investigated the differential gene expression patterns in the prefrontal cortex of young and old dogs in order to characterise the gene expression changes in aging dogs [9]. In the current study, we would like to extend this previous research by studying the same parameters in a more accessible tissue of the dogs, namely the whole blood. A major advantage of whole blood tissue is that it can be easily collected from the subjects, unlike the brain tissue in our previous experiment, collected by the Canine Brain and Tissue Bank [10]. Consequently, blood biomarkers of aging or disorders are of greater practical importance in animal and human healthcare. However, using blood has some disadvantages, too. Whole blood contains a high number of red blood cells, which include large amounts of hemoglobin (Hg) related mRNA [11]. These mRNA molecules in an RNA sequencing experiment can represent a significant proportion of the total mRNA. Therefore, other mRNA molecules will be underrepresented in the sequencing process if the sequencing depth is not increased significantly, similarly to an RNA-seq experiment, where ribosome depletion is not implemented prior to sequencing. Harrington et al. [11] reported that implementing globin RNA depletion prior to RNA sequencing of whole blood is advantageous in blood transcriptome sequencing studies. However, so far, the only publicly available blood transcriptome sequencing data from dogs [12] did not include this pre-sequencing hemoglobin depletion step. That study compared the whole blood transcriptome between dogs and wolves in search of signs of domestication. Therefore, sequencing the whole blood transcriptome was advantageous in our case, as it allowed a direct comparison with the data of Yang et al. [12] as well as the extension of this previous experiment [13]. A more recent study described RNA library preparation protocols for hemoglobin-rich tissues specifically in dogs, which included a globin depletion step as well (data available upon request from the authors) [14].

Regarding age-related transcriptome changes, Schaum et al. [15] examined 17 organs through the complete life cycle of mice from the age of 1 to 27 months by RNA sequencing and performed differential gene expression analyses. Although comparing young adult mice (max. 6 months old) to elderly (24 months old), they discovered a significant number of differentially expressed genes, this was not replicated when the same cluster of 6 months old mice were compared to either 21 or 27 months old mice groups. Furthermore, Schaum et al. studied white blood cells (buffy coat) from the mice in contrast to our whole blood samples. Age-related changes in RNA levels were previously studied in humans in lymphocytes [16] or in whole blood ([17]; [18]). Peters et al. [17] identified ~ 1500 differentially expressed genes as a factor of age in a large-scale study, in which potentially active CpG methylation sites were overrepresented. Pathways associated with these genes were e.g., dysregulation of transcription and translation, DNA damage accumulation, immune senescence or ribosome biogenesis. Novel pathways, e.g., actin remodeling were also reported in this study, which were not linked to aging previously. Viñuela et al. [18] reported 680 protein-coding genes with at least one exon with a changing expression level as a function of age based on a twin study (age range: 39–85 years). They found that age explained ~ 5% of the variance of gene expression changes in age-associated exons. A similar analysis identified 625 differentially expressed genes between young and old wolves, affecting e.g., immune response, RNA metabolic process [19], partly shared functions as identified in Peters et al. [17].

The main aim of this study was to investigate the gene expression differences with age in dogs in whole blood samples. To answer this question, we sampled 5 young and 5 old companion dogs of the same breed (border collie), isolated and sequenced the protein coding transcripts, and implemented a differential gene expression analysis. Our secondary aim was to investigate the effects of the hemoglobin-related transcripts on the data quality and the RNA-seq results alike.

2.1 Samples

Ten border collies were sampled for this experiment, divided into two distinct age groups: the young cohort included animals of age 1–3 years (n = 5; mean age: 1,4 years), and the old cohort included animals of age 10–15 years (n = 5; mean age: 12 years). All dogs were raised and kept as companion dogs in Hungary and were unrelated. All dogs’ breed and age were certified by their official documents signed by their veterinarian. Owners were interviewed about their dog’s health status and they did not show any sign of illness and did not take any medications in a two weeks long period before samples were taken. We sent one ml blood to a veterinary diagnostic laboratory to verify their health status independently from owners’ report. Both the sex and the neuter status were mixed in both cohorts (Table 1).

Table 1

Summary of the analysed samples.
ID	Age (years)	Breed	Sex	Neutered	RNA concentration (ng/µl)
CL_y1	1	Border collie	Male	Yes	56.4
CL_y2	1	Border collie	Female	No	38.6
CL_y3	3	Border collie	Female	No	67.0
CL_y4	1	Border collie	Female	Yes	42.2
CL_y5	1	Border collie	Female	No	36.8
CL_o1	11	Border collie	Female	Yes	53.0
CL_o2	10	Border collie	Male	No	38.4
CL_o3	15	Border collie	Female	Yes	31.8
CL_o4	14	Border collie	Male	No	43.2
CL_o5	10	Border collie	Female	Yes	50.8

2.2 Blood sampling, RNA extraction, library preparation and sequencing

Three ml blood was taken from the vena cephalica or the vena saphena lateralis in accordance with the principles of “lege artis” by an experienced veterinarian in the presence of the dog’s owner. In order to preserve the RNA fraction of the blood as intact as possible, it was collected directly into DNA/RNABlood Collection Tubes (Zymo Research). Blood was gently mixed with the special RNA preservative fluid and stored in -20°C. Quick-DNA/RNA Blood Tube kits (Zymo Research) were used for RNA extraction. Isolated RNA was stored in -80°C ultra-low temperature freezer until further processing.

RNA samples were treated with DNase in order to remove DNA contamination from the RNA samples. Protein coding RNA was purified via poly(A) capture, and library preparation was performed with the TruSeq® Stranded mRNA Library preparation kit (Illumina, CA, USA). Sample quality control was implemented after the DNase treatment and after the library preparation alike (RIN: 8.5–10.0 in all samples). Following the successful library preparation, samples were sequenced at the iBioScience company (Pécs, Hungary) with a Novaseq 6000 Illumina sequencer machine. The minimum read number was set to 42 million paired-end reads per sample. Read length was 150 basepairs (bp). Sequence data was shared with us by the company in fastq format, and our data analysis started with the quality control of the raw sequence data.

In accordance with the recommendations of Harrington et al. [11], we performed an in silico hemoglobin RNA depletion, which – as shown by Harrington et al. [11] – is expected to improve the overall performance of a whole blood RNA-seq experiment.

2.3 Data analysis

Throughout the analysis, we aimed to use the default parameters of each software; however, this was not always possible. In the following description, we specify every parameter setting (except the obligatory ones) for every software that we had changed; we also provide short reasoning whenever our parameter setting is not straightforward.

2.1.1 Quality check and data preparation

The FastQC software ([20]; RRID:SCR_014583) with its default parameter settings was used to check the raw read quality; the -o option was set to change the output directory, and the --noextract and the -f option to define the file format. Following the quality control, the cutadapt software ([21]; RRID:SCR_011841) was used to perform the following tasks: 1) remove the trailing high-quality guanine (G) bases from the 3’ end of the sequences, which were added to the shorter fragments by the Novaseq 6000 sequencer, a 2-dye sequencing system and 2) remove the adapter sequences. In addition to the -a and -A options to specify the 3’ and 5’ adapter sequences, we also used the -j option to increase the number of CPU to be used to five; the -m option to discard all reads shorter than 50 basepairs and keep only the longer ones; the –nextseq-trim 20 option was used to remove the trailing G bases from the sequences. In a second cutadapt run, the -u 17, -U 17 options were used to hard-trim some additional bases from the sequences, that did not pass the FastQC quality control.

In the next step, we aligned the reads to the dog reference genome (genome version: CanFam 3.1, [22]; genome annotation: Ensembl v98, [23]). We used the HISAT2 split-read aligner ([24]; RRID:SCR_015530) with the following options: -p 6; --dta; input files were provided with the − 1 and − 2 options, which allowed the processing of paired-end reads.

Next, we determined the dog homologs of the human hemoglobin genes published by Harrington et al. ([11]; Table S1 of that article) and removed the reads aligned at the dog homologs of the hemoglobin genes (Table 2) for one part of the analysis. The homologs were identified using in-house scripts. The reads were removed using the BEDtools (intersectBed; [25]; RRID:SCR_006646) and the Picard software packages (FilterSamReads tool; [26]; RRID:SCR_006525). The genes’ relevance in our samples was also investigated based on their actual expression levels. Although the hemoglobin genes were excluded from the differential gene expression analysis, the effect of their presence in the RNA library was investigated and discussed below. Hemoglobin reads were removed immediately after the alignment, directly from the bam files.

Table 2

Hemoglobin-related genes from Harrington et al. [11] and their canine orthologs together with genomic information about the genes. The dashed line separates the active and archived canine genes in Ensembl’s annotation database.
Chr¹	Gene start	Gene end	Strand	Dog gene ID	Human gene ID	Human gene abbreviation	Human gene name	Gene type	Ortholog type	Dog gene’s status
6	40324263	40326351	Reverse	ENSCAFG00000032615	ENSG00000188536	HBA2	hemoglobin subunit alpha 2	Protein coding	Many to many	Active
6	40324263	40326351	Reverse	ENSCAFG00000032615	ENSG00000206172	HBA1	hemoglobin subunit alpha 1	Protein coding	Many to many	Active
6	40330683	40332438	Reverse	ENSCAFG00000029224	ENSG00000206177	HBM	hemoglobin subunit mu	Protein coding	One to one	Active
6	40342562	40343504	Reverse	ENSCAFG00000028569	ENSG00000130656	HBZ	hemoglobin subunit zeta	Protein coding	One to many	Active
21	28181347	28205138	Reverse	ENSCAFG00000030286	ENSG00000223609	HBD	hemoglobin subunit delta	Protein coding	Many to many	Active
21	28181347	28205138	Reverse	ENSCAFG00000030286	ENSG00000244734	HBB	hemoglobin subunit beta	Protein coding	Many to many	Active
6	40326459	40329857	Reverse	ENSCAFG00000029904	ENSG00000188536	HBA2	hemoglobin subunit alpha 2	Protein coding	Many to many	Archived
6	40326459	40329857	Reverse	ENSCAFG00000029904	ENSG00000206172	HBA1	hemoglobin subunit alpha 1	Protein coding	Many to many	Archived
6	40331793	40332696	Reverse	ENSCAFG00000031055	ENSG00000130656	HBZ	hemoglobin subunit zeta	Protein coding	One to many	Archived
21	28179119	28180299	Reverse	ENSCAFG00000029518	ENSG00000223609	HBD	hemoglobin subunit delta	Protein coding	Many to many	Archived
21	28179119	28180299	Reverse	ENSCAFG00000029518	ENSG00000244734	HBB	hemoglobin subunit beta	Protein coding	Many to many	Archived
21	28193272	28194670	Reverse	ENSCAFG00000024181	ENSG00000196565	HBG2	hemoglobin subunit gamma 2	Protein coding	One to many	Archived
21	28193272	28194670	Reverse	ENSCAFG00000024181	ENSG00000213934	HBG1	hemoglobin subunit gamma 1	Protein coding	One to many	Archived
^{1: Chromosome}

2.1.2 Differential gene expression analysis

Differential gene expression analysis was implemented between the young and old cohorts. The DESeq2 R package ([27]; RRID: SCR_015687) was used to perform the differential gene expression analysis, as described in the authors’ workflow published by Love et al. [28]. The input of the DESeq2 software is a count matrix, where k_i,j is the number of aligned reads at gene i in animal j. This count table was generated with the featureCounts() function of the Rsubread R package, as recommended in the cited workflow. The following options were used to generate the appropriate data table: isGTFAnnotationFile = T, isPairedEnd = T, countMultiMappingReads = F, the latter being an important prerequisite of the software [27].

We decided to apply the more conservative definition of expressed gene from the workflow, and therefore in this study, all genes with at least ten reads in five or more samples were considered as an expressed gene in the blood. The minimum number of individuals (i.e. five) was selected as this was the size of both animal cohorts in our study. We applied the regularised-logarithm transformation (rlog), which is recommended for our sample size. This choice was also supported by the transformation efficiency comparison of the three implemented transformations in the DESeq2 package (rlog, variance stabilising transformation – or vst – and the log₂ transformation; Figure S1). This transformation was used when the data were analysed in the descriptive statistical tests. However, it was not used for the differential gene expression analysis.

No parameter other than chronological age was expected to have a systematic effect on the gene expression levels and on the differential gene expression analysis. Therefore, the model used in the differential gene expression analysis included only an intercept and the age effect. The false discovery rate (FDR-) adjusted p-values were calculated, and the default cutoff value of 0.1 was used to identify the significantly differentially expressed genes. We performed the analysis with and without excluding the hemoglobin-related reads, which allowed us the comparison of the effects of the presence of the hemoglobin genes in the dataset.

Figures were created using R packages (ggplot2 (RRID:SCR_014601), DESeq2, VennDiagram (RRID:SCR_002414)), while tables were primarily created in MS Office Excel.

The outline of the analysis pipeline is shown in Figure S2.

2.4 Additional analyses

During the described analysis, we observed some divergence between the descriptive statistics of our data and most published RNA-seq experiments. Most notably, we observed a high proportion of secondary alignments in our data and a strong correlation between the initial RNA concentration levels in the samples and the sequencing depth. During the study we also investigated these phenomena.

Prior to blood sampling, all owners confirmed that the dogs were healthy and did not show any signs of illness at least in the past two weeks prior to the sampling date. Furthermore, a veterinarian also examined the dogs prior to blood sampling and all donors were classified as healthy without any infections. A routine laboratory blood test was also carried out, with negative results, indicating a good general health (Supplementary Data 1).

A power analysis was carried out as implemented in the RnaSeqSampleSize R package [29]. The FDR level was set to 10%, its default parameter value in the DESeq2 R package used for the differential gene expression analysis here and the minimum fold change (rho) was set to 3. The estimated power was 0.98, indicating that the group sizes of five samples, combined with the high sequencing depth was sufficient for the study.

3.1 Raw data quality

Table 3 shows the raw data statistics of the analysed samples. Although the young animals had a higher average sequencing depth (142 million vs. 128 million reads), this was primarily due to an outlier sample (CL_y3), which was sequenced to ~ 211 million reads. Without this individual, there was no significant difference between the two groups with regard to sequencing depth. After removing the outlier, the only significant difference between the age groups could be observed in the number of secondary alignments, including the hemoglobin genes as well (11 million reads – or 8% – more in the old cohort).

Table 3

Sequencing and alignment statistics of the 10 samples.
Sample ID	Number of sequenced reads	Number of reads after adapter trimming	Number of aligned reads		Number and proportion of hemoglobin reads		Secondary alignments with hemoglobin genes		Secondary alignments without hemoglobin genes
Sample ID	Number of sequenced reads	Number of reads after adapter trimming	N	%	N	%	N	%	N	%
CL_y1	139574092	99781996	87552180	87.74	64061880	73.17	100056226	100.27	2175945	2.18
CL_y2	145101716	114231400	98754640	86.45	24426835	24.73	33712226	29.51	5736365	5.02
CL_y3	211298362	137241346	120949359	88.13	91142725	75.36	174121970	126.87	2585114	1.88
CL_y4	115495360	102506414	87090986	84.96	20523206	23.57	36225285	35.34	4687901	4.57
CL_y5	100769724	76546858	66152258	86.42	35704823	53.97	51500272	67.28	2617499	3.42
CL_o1	132309048	100814174	86252499	85.56	38902610	45.10	75151359	74.54	4017804	3.99
CL_o2	115616142	93060894	79023663	84.92	37103408	46.95	57135456	61.40	3463513	3.72
CL_o3	120024014	91619710	80186573	87.52	29403124	36.67	31520157	34.40	3990157	4.36
CL_o4	124942900	103191914	88892129	86.14	30694317	34.53	54958699	53.26	3720187	3.61
CL_o5	148054394	103245778	90434886	87.59	58026387	64.16	111519462	108.01	2956990	2.86
young average	142447851	106061603	92099885	86.74	47171894	50.16	79123196	71.85	3560565	3.41
old average	128189300	98386494	84957950	86.35	38825969	45.48	66057027	66.32	3629730	3.71

Although the raw data quality checks (including: adapter trimming, removal of the high quality poly-G sequences from the 3’ end of the reads, hard trimming of the ends of the sequences, removal of reads shorter than 50 bp) removed 24% of the reads, still approximately 100 million reads were retained for the analysis per sample (ranging from 76 to 132 million reads in the young cohort and 91–103 million reads in the old cohort). This number of reads corresponds to ~ 50 million fragments.

The alignment rate was high (86%; on average 92 and 85 million reads aligned in the young and old cohorts, respectively). Therefore our dataset was appropriate for the planned differential gene expression analysis.

We also investigated the hemoglobin genes and the number of reads aligning to these genes. On average, 43 million reads (or 48% of the reads kept after raw data filtering) aligned to the hemoglobin genes. A large variation could be observed in the data with respect to the number of hemoglobin-related reads, ranging from 24 to 75% of the filtered reads in the samples.

The hemoglobin-related reads’ filtering led to a large and significant reduction in the total read number, which was reduced to 23–74 million for the different samples. This affected the samples differently, with the largest changes in the young cohort: three samples had 23–30 million reads, while 2 samples had 66 and 74 million reads after removing the Hg-related reads. The range of the read numbers in the old cohort remained more similar, but a considerable variation existed in that group as well (32–58 million reads per sample). The varying amount of hemoglobin-related mRNA introduced an unwanted bias to our experiment.

The number of secondary alignments was also affected by the hemoglobin genes. The average proportion of secondary alignments compared to the primary alignments was 69%, but it went up to as high as 127%, i.e. more secondary alignments were present in some samples than primary alignments. However, when the hemoglobin reads were removed, the proportion of multi-mapped reads dropped to a normal level, and the secondary alignments were at an average level of 3.6% across the samples, with a negligible difference between the age cohorts (3.4% and 3.7% in the young and old cohorts).

Consequently, the hemoglobin reads and the associated genes – primarily due to the large within-cohort variation – represented a large, random bias in our dataset. As a result, as well as following the recommendations of Harrington et al. [11], both the Hg genes and the associated reads were excluded from the downstream analysis. A reduction in statistical power is expected due to the large reduction in the read counts.

3.2 Descriptive statistical analysis

A total of 12966 genes were expressed in the canine blood. The age clusters could not be differentiated in a multidimensional scaling analysis, which was applied on the rlog-transformed read counts of the ~ 13000 expressed genes (Fig. 1). This suggests that the chronological age of the dogs was not the primary source of the observed read count variance in our data.

Both a principal component analysis (Figure S3) and the Euclidean distances calculated from the rlog-transformed read counts of the samples (Figure S4) led to very similar observations, and neither of these two additional analyses could successfully differentiate the age groups. Thus, these analyses support the multidimensional scaling, that age was not the primary source of variation observed in the per gene read counts.

3.3 Differential gene expression analysis

Figure 2 shows an MA-plot of all expressed genes in the companion dog’s blood tissue. The overwhelming majority of the genes had a fold change around \({log}_{2}0\): the fold change was between − 1 and 1 for 12541 genes (or 97% of the expressed genes). This implies that gene expression changes in the blood transcriptome of the dogs as a function of age are exceptionally rare. This is true in spite of the significant differences between the number of aligned reads without the hemoglobin reads in the two age cohorts (Table 3).

Indeed, we identified as few as 61 differentially expressed genes, which was 0.5% of all expressed genes. 31 of these were downregulated in old dogs compared to young dogs and 30 were upregulated in the same direction (Fig. 3; the fold change of the significant genes ranges from 0.5 to 5.6). Clustering of the sequenced animals based on the gene expression profiles of the differentially expressed genes unsurprisingly separated the two examined clusters – in contrast with the previous analyses (e.g. MDS; Fig. 1), which was based on the normalised read counts of all expressed genes.

We also tested the effect of sex and neutering as covariates in the fitted model, with neither of the two parameters with a significant effect on the results. Applying the same thresholds, out of the tested 12,966 genes only three were significantly differently expressed between males and females and zero between neutered and non-neutered animals.

Furthermore, an independent, parallel differential gene expression analysis with the edgeR R package, using its default parameter values, did not result in differentially expressed genes between the two groups (data not shown).

3.4 Functional analysis of the differentially expressed genes

Next, we compared the differentially expressed genes in a gene ontology (GO) overrepresentation test to the background of all expressed genes detected in the blood (n ~ 12966) in dogs. This analysis was implemented using the pantherDB on-line tool ([30]; [31]). There was only one gene ontology that was significantly enriched in the overrepresentation test. The fold enrichment of response to bacterium (GO:0009617) was 8.62, and the corresponding false discovery rate adjusted p-value 0.014.

Aging is one of the most relevant factors in disease development and mortality. Recently, we identified genes, which exhibit different expression levels between young and old dogs in the prefrontal cortex [9]. However, brain tissue samples are extremely difficult to collect, therefore, changes of gene expression related to pathological aging or disease cannot be monitored from brain tissue. This limitation could be alleviated by using whole blood, an easily accessible tissue, which is already a well-known source of biomarkers for many diseases. Here we explored the age-related differential gene expression patterns in whole blood to characterise genetic regulatory networks related to aging and search for possible RNA-based biomarkers of aging in dogs.

The produced raw data was of good quality. Although ~ 24% of the reads were filtered out before alignment to the reference genome, due to the very high initial sequencing depth, more than 100 million reads (~ 50 million sequenced fragments) per sample remained on average. The number of aligned reads surpasses the ENCODE recommendations by at least 30% (92 and 85 million on average in the young and old cohorts, respectively; [32]).

We found 61 differentially expressed genes, after removing the hemoglobin-related genes. However, approximately 50% of them were likely false positives because the evidence provided by the data was not convincing after a visual examination in a genome browser (data not shown). Another unexpected observation was that the range of the fold change (FC) values of the significant hits was much lower than in our recent RNA-seq experiments on dogs. Log₂ FC was between − 1 and 2.5 in the current study, while it was between − 8.19 and 7.54 in our dog vs. wolf experiment [13] and between − 8.31 and 4.46 when we compared gene expression profiles in young and old dogs’ brain; [9]). A technical replicate of the differential gene expression analysis implemented with the edgeR R package confirmed that there were no differentially expressed genes between the examined cohorts.

Given the low number of DEGs (n = 61), it is not surprising that a gene ontology (GO) analysis identified only response to bacterium. It is possible that the difference is due to the dysregulated functioning of the older animals’ immune system. However, the connection between such a hypothesis and our data would be challenging to prove.

Hemoglobin-related reads were removed after the alignment to the reference genome, in accordance with the recommendations of Harrington et al. [11]. The removal had an uneven influence on the analysed samples, as 24–75% of the reads were lost, depending on the samples. This led to a disproportional reduction in the read counts among the samples, which introduced some uncertainty to our results. Consequently, the detection power was reduced in our analysis. However, there are indicators which suggest that this did not significantly affect our analysis. First, the different techniques used to analyse the raw data (multidimensional scaling, principal component analysis, sample distance calculations) yielded consistent results, increasing the confidence in the analysis. Second, the MA-plot (Fig. 2) does not show a difference in read counts between the two clusters, implying that there are not many differentially expressed genes in the two age cohorts. Finally, when our dataset was analysed in another context (investigating the differentially expressed genes between companion dogs and wolves; [13]; hemoglobin related reads were similarly excluded) with additional samples from Yang et al. [12], we did see differentially expressed genes between the dog and wolf samples. We could identify 90% of the differentially expressed genes published earlier [12] and identified additional differentially expressed genes (n = 1396). The results indicate that the sequencing depth was sufficient for a differential gene expression analysis even after removing the hemoglobin reads.

When we compared the hemoglobin related reads between our data and that of Yang et al. [12], the same patterns could be observed. This comparison included the total read counts, the proportion of all reads, as well as the number and proportion of secondary alignments (some information about the data of Yang et al. [12] can be found in Table S1, while our own data is presented in Table 3). Consequently, our samples were not extreme in this regard: a large proportion of the reads aligned to the hemoglobin genes in both datasets, as average hemoglobin-related reads were 48% and 50% in our current study and the dataset of Yang et al. [12], respectively. The variance of the number of hemoglobin genes between the samples was similarly high in both datasets. The proportion of the secondary alignments was lower in our samples than in Yang et al., ([12]; 69% vs. 149%). However, it is worth mentioning that the number of secondary alignments and the number of hemoglobin-related reads were significantly higher in wolves than in dogs, likely due to biological reasons rather than technical issues (see [13] for details). Hemoglobin genes are paralogous genes with 30–50% cDNA sequence identity, according to Ensembl [23]. The high sequence identity explains the increased proportion of multimapping reads by more than 10-fold on average (Table 3) and why the proportion of the secondary alignments decreased to a normal level (~ 3%) in all datasets after removing the hemoglobin reads. Similarly high proportion of globin reads was found in a human study earlier without globin depletion (81%; [33]). These indicate that the RNA-seq data generated for this study is appropriate for a differential gene expression analysis even after removing the hemoglobin reads. Consequently, our differential gene expression analysis could be considered valid. There were indeed no significant differences between the gene expression profiles in the whole blood tissue of young and old dogs. However, to validate our results with high confidence, this study could be replicated with either an even higher sequencing depth or an analysis of red blood cell depleted blood samples.

However, in contrast to our results, Charruau et al. [19] identified 625 differentially expressed genes between young and old wolves, split into 214 up- and 411 down-regulated genes. They also described several enriched gene ontology terms among these genes. Multiple of those (such as regulation of metabolic process, RNA metabolic process, chromatin modification or immune response) were reported to be similar to previously defined GO terms in humans. Notably, Charruau et al. [19] collected samples from 27 wolves from the Yellowstone National Park, presumably with similar living conditions, whereas our companion dogs lived in different homes. Environmental factors, such as physical activity and diet, can affect blood cells [34]. Therefore, it is possible that we did not identify differentially expressed genes due to these unknown confounding factors. Our sample size was too low to control for the physical activity and diet that might significantly differ between the sampled pet dogs.

On the other hand, we could exclude the breed of the animals as a confounding factor, because we sampled only a single breed (border collie) in order to reduce the within-group variance. Further investigations could investigate the effect of these parameters on whole blood gene expression levels and aging. The significant increase of the sample size might reveal the impact of these confounding factors.

In this study we compared the gene expression levels of young and old border collie dogs and found only a minimal number of differentially expressed genes, with no apparent relationship to aging.

We suspect that some confounding environmental factors might have interfered with the differential gene expression analysis, such as diet. The effects of these factors on blood gene expression levels should be studied in the future, when looking for aging-related blood biomarkers. An interesting research topic for future studies would be investigating gene expression differences as a function of biological age in a genetically heterogeneous population, instead of the chronological age used in the current study, which could give a more detailed insight into this topic.

Similarly to Harrington et al. [11], we recommend the removal of red blood cells prior to sequencing, or if this is not possible, then the in silico removal of the haemoglobin-related reads. However, in this latter case, a much higher sequencing depth might be required to obtain the same statistical power to analyse the remaining protein-coding genes.

6 Acknowledgments

We are grateful to Dr. Kálmán Czeibert PhD for his help provided for the veterinarian checks of the dogs as well as for blood sampling. We are grateful to KIFÜ (the [Hungarian] Governmental Agency for IT Development), who provided both the necessary supercomputing resources based in Hungary at Debrecen, and the technical support to successfully implement this project. We also would like to express our gratitude to the staff of iBioScience (Pécs, Hungary); their experts’ opinions were of high value and were influential for this project.

7 Sources of funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 680040) and from the Hungarian Academy of Sciences via a grant to the MTA-ELTE ‘Lendület/Momentum’ Companion Animal Research Group (grant no. Ph2404/21).

8 Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

9 Author Contributions

KT did the wet-lab experiments. DJ performed the analysis. DJ and KT wrote the manuscript. EK edited the manuscript. EK and BE participated in the interpretation and discussion of the results. EK provided funding to the project. All authors read and approved the final manuscript.

10 Ethics

All procedures complied with national and EU legislation and institutional guidelines in strict accordance with the recommendations in the International Society for Applied Ethology guidelines for the use of animals in research. The study received Ethical Permission Hungarian Pest County Governmental Office following the ethical review of the Eötvös Loránd University (Permission No.: PE/EA/301-4/2021). Dog owners provided written consent to their voluntary participation. We took special care to ensure that the consent process was understood completely by the dog owners. In the Consent Form, participants were informed about the identity of the researchers, the aim, procedure, location, expected time commitment of the experiment, the handling of personal and research data, and data reuse. The information included the participant’s right to withdraw their consent at any time.

11 Data Availability Statement

The genetic data generated and analysed for this study can be found in the [American] National Center for Biotechnology Information’s Sequence Read Archive under BioProject ID: PRJNA823683. The laboratory blood test data is available in Supplementary Data 1.

12 Consent for publication

Not applicable.

Flatt T, Partridge L. Horizons in the evolution of aging. BMC Biol. 2018. https://doi.org/10.1186/s12915-018-0562-z.
López-Otín C, Blasco MA, Partridge L, Serrano M, Kroemer G. The hallmarks of aging. Cell. 2013. https://doi.org/10.1016/j.cell.2013.05.039.
Zannas AS. Gene-environment interactions in late life: linking psychosocial stress with brain aging. Curr Neuropharmacol. 2018. https://doi.org/10.2174/1570159x15666171109121452.
Gilmore KM, Greer KA. Why is the dog an ideal model for aging research? Exp Gerontol. 2015. https://doi.org/10.1016/j.exger.2015.08.008.
Kaeberlein M, Creevy KE, Promislow DEL. The dog aging project: translational geroscience in companion animals. Mamm Genome. 2016. https://doi.org/10.1007/s00335-016-9638-7.
Ambrosini YM, Borcherding D, Kanthasamy A, Kim HJ, Willette AA, Jergens A, Allenspach K, Mochel JP. The gut-brain axis in neurodegenerative diseases and relevance of the canine model: a review. Front Aging Neurosci. 2019. https://doi.org/10.3389/fnagi.2019.00130.
Chapagain D, Range F, Huber L, Virányi Z. (2018). Cognitive aging in dogs. Review Gerontology. 2018; https://doi.org/10.1159/000481621.
Sándor S, Kubinyi E. Genetic pathways of aging and their relevance in the dog as a natural model of human aging. Front Genet. 2019. https://doi.org/10.3389/fgene.2019.00948.
Sándor S, Jónás D, Tátrai K, Czeibert K, Kubinyi E. Poly(A) RNA sequencing reveals age-related differences in the prefrontal cortex of dogs. Geroscience. 2022. https://doi.org/10.1007/s11357-022-00533-3.
Sándor S, Czeibert K, Salamon A, Kubinyi E. Man’s best friend in life and death: scientific perspectives and challenges of dog brain banking. GeroScience. 2021. https://doi.org/10.1007/s11357-021-00373-7.
Harrington CA, Fei SS, Minnier J, Carbone L, Searles L, Davis BA, Ogle K, Planck SR, Rosenbaum JT, Choi D. RNA-seq of human whole blood: evaluation of globin RNA depletion on Ribo-Zero library method. Sci Rep. 2020. https://doi.org/10.1038/s41598-020-62801-6.
Yang X, Zhang H, Shang J, Liu G, Xia T, Zhao C, Sun G, Dou H. Comparative analysis of the blood transcriptomes between wolves and dogs. Anim Genet. 2018. https://doi.org/10.1111/age.12675.
Jónás D, Sándor S, Tátrai K, Egyed B, Kubinyi E. Differential gene expression analysis suggests that translation regulation was a crucial prerequisite of dog domestication. Submitted.
Ezer S, Yoshihara M, Katayama S, DoGA consortium, Daub C, Lohi H, Krjutskov K, Kere J. Generation of RNA sequencing libraries for transcriptome analysis of globin-rich tissues of the domestic dog. Star Protoc. 2021. doi:10.1016/j.xpro.2021.100995.
Schaum N, Lehallier B, Hahn O, Pálovics R, Hosseinzadeh S, Lee SE, Sit R, Lee DP, Losada PM, Zardeneta NE, Fehlmann T, Webber JT, McGeever A, Calcuttawala K, Zhang H, Berdnik D, Mathur V, Tan W, Zee A, Tan M, Tabula Muris Consortium, Pisco AO, Karkanias J, Neff NF, Keller A, Darmanis S, Quake SR, Wyss-Coray T. Ageing hallmarks exhibit organ-specific temporal signatures. Nature. 2020. https://doi.org/10.1038/s41586-020-2499-y.
Hong M-G, Aj M, Magnusson PKE, Prince JA. Transcriptome-wide assessment of human brain and lymphocyte senescence. PLoS ONE. 2008. https://doi.org/10.1371/journal.pone.0003024.
Peters MJ, Joehanes R, Pilling LC, Schurmann C, Conneely KN, Powell J, Reinmaa E, Sutphin GL, Zhernakova A, Schramm K, Wilson YA, Kobes S, Tukiainen T, Consortium NABEC/UKBEC, Ramos YF, Göring HHH, Fornage M, Liu Y, Gharib SA, Stranger BE, De Jager PL, Aviv A, Levy D, Murabito JM, Munson PJ, Huan T, Hofman A, Uitterlinden AG, Rivadeneira F, van Rooij J, Stolk L, Broer L, Verbiest MMPJ, Jhamai M, Arp P, Metspalu A, Tserel L, Milani L, Samani NJ, Peterson P, Kasela S, Codd V, Peters A, Ward-Caviness CK, Herder C, Waldenberger M, Roden M, Singmann P, Zeilinger S, Illig T, Homuth G, Grabe H-J, Völzke H, Steil L, Kocher T, Murray A, Melzer D, Yaghootkar H, Bandinelli S, Moses EK, Kent JW, Curran JE, Johnson MP, Williams-Blangero S, Westra H-J, McRae AF, Smith JA, Kardia SLR, Hovatta I, Perola M, Ripatti S, Salomaa V, Henders AK, Martin NG, Smith AK, Mehta D, Binder EB, Nylocks KM, Kennedy EM, Klengel T, Ding J, Suchy-Dicey AM, Enquobahrie DA, Brody J, Rotter JI, Chen Y-DI, Houwing-Duistermaat J, Kloppenburg M, Slagboom PE, Helmer Q, den Hollander W, Bean S, Raj T, Bakhshi N, Wang QP, Oyston LJ, Psaty BM, Tracy RP, Montgomery GW, Turner ST, Blangero J, Meulenbelt I, Ressler KJ, Yang J, Franke L, Kettunen J, Visscher PM, Neely GG, Korstanje R, Hanson RL, Prokisch H, Ferrucci L, Esko T, Teumer A, van Meurs YBJ, Johnson AD. The transcriptional landscape of age in human peripheral blood. Nat Commun. 2015. https://doi.org/10.1038/ncomms9570.
Viñuela A, Brown AA, Buil A, Tsai P-C, Davies MN, Bell JT, Dermitzakis ET, Spector TD, Small KS. Age-dependent changes in mean and variance of gene expression across tissues in a twin cohort. Hum Mol Genet. 2018. https://doi.org/10.1093/hmg/ddx424.
Charruau P, Johnston RA, Stahler DR, Lea A, Snyder-Mackler N, Smith DW, vonHoldt BM, Cole SW, Tung J, Wayne RK. Pervasive effects of aging on gene expression in wild wolves. Mol Biol Evol. 2016. https://doi.org/10.1093/molbev/msw072.
Andrews S. (2010). FastQC: A quality control tool for high throughput sequence data. Available at: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Downloaded: 16/07/2018.
Martin (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. J Comput Biol. 2017; https://doi.org/10.1089/cmb.2017.0096.
22.</number&gt
Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Charkhchi M, Cummins C, Da Rin Fioretto L, Davidson C, Dodiya K, Houdaigui BE, Fatima R, Gall A, Giron CG, Grego T, Guijarro-Clarke C, Haggerty L, Hemrom A, Hourlier T, Izuogu OG, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Martinez JG, Marugán JC, Maurel T, McMahon AC, Mohanan S, Moore B, Muffato M, Oheh DN, Paraschas D, Parker A, Parton A, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Steed E, Szpak M, Szuba M, TaylorK, Thormann A, Threadgold G, Walts B, Winterbottom A, Chakiachvili M, Chaubal A, De Silva N, Flint B, Frankish A, Hunt SE, IIsley GR, Langridge N, Loveland JE, Martin FJ, Mudge JM, Morales J, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Cunningham F, Yates AD, Zerbino DR, Flicek P. Ensembl 2021. Nucleic Acids Res. 2021. https://doi.org/10.1093/nar/gkaa942.
Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015. https://doi.org/10.1038/nmeth.3317.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010. https://doi.org/10.1093/bioinformatics/btq033.
Broad Institute. Picard Toolkit. 2019. Available online at: GitHub Repository – http://broadinstitute.github.io/picard/. Accessed: 16th of July, 2018.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014. https://doi.org/10.1186/s13059-014-0550-8.
Love MI, Anders S, Kim V, Huber W. RNA-seq workflow: gene-level exploratory analysis and differential expression. F1000Res. 2015; https://doi.org/10.12688/f1000research.7035.1. Available on-line: https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html.
Zhao S, Li C-I, Guo Y, Sheng Q, Shyr Y. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing. BMC Bioinformatics. 2018. doi:10.1186/s12859-018-2191-5.
Thomas PD, Kejariwal A, Guo N, Mi H, et al. Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res. 2006; https://doi.org/10.1093/nar/gkl229.
Mi H, Ebert D, Muruganujan A, Mills C, Albou L-P, Mushayamaha T, Thomas PD. PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Res. 2021. https://doi.org/10.1093/nar/gkaa1106.
Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, et al. The encyclopedia of DNA elements (ENCODE): Data portal update. Nucleic Acids Res. 2018. https://doi.org/10.1093/nar/gkx1081.
Shin H, Shannon CP, Fishbane N, Ruan J, Zhou M, Balshaw R, Wilson-McManus JE, Ng RT, McManus BM, Tebbutt SJ, PROOF Centre of Excellence Team. Variation in RNA-seq transcriptome profiles of peripheral whole blood from healthy individuals with and without globin depletion. PLoS ONE. 2014. https://doi.org/10.1371/journal.pone.0091041.
Feriel J, Tchipeva D, Depasse F. Effects of circadian variation, lifestyle and environment on hematological parameters: a narrative review. Int J Lab Hematol. 2021. https://doi.org/10.1111/ijlh.13590.

Download PDF

Version 1

posted

You are reading this latest preprint version

Differential gene expression analysis of the whole blood transcriptome between young and old companion border collie dogs

Status:

Version 1

Abstract

Figures

1 Introduction

2 Methods

2.1.1 Quality check and data preparation

2.1.2 Differential gene expression analysis

3 Results

4 Discussion

5 Conclusions

Declarations

6 Acknowledgments

7 Sources of funding

8 Conflict of Interest

9 Author Contributions

10 Ethics

11 Data Availability Statement

12 Consent for publication

References

Supplementary Files

Status:

Version 1