Sequencing and read assembly
The RNA-sequencing of the sixteen PFC samples generated an average of 78.3 million paired-end reads per sample. The mean mapping proportions obtained with the STAR software was 91.8%, similar among different samples (from 88.07 to 94.91%) (Supplementary Table 3). The mapped reads were processed with Cufflinks toolkits for differential expression analysis, revealing a total of 16,384 DEG between the aggressive and non-aggressive groups; of those genes, 1,196 were statistically significant, producing 10,640 isoforms (8.86 transcripts per gene) (Table 1, Figure 1A). Gene expression differences of the up-regulated DEG (log2FC ≥ 0.1) were greater in number, involving 918 genes, than those down-regulated; 278 DEG (log2FC ≤ 0.1) (Figure 1B and C). For the complete list of up and down-regulated DEG see Supplementary Table 4.
Genes in common with the cross-species comparative analysis (CSCA)
The up and down-regulated DEGs ≥1 WR values were compared with the compendium genes-list associated with aggressive behavior (Supplementary Table 1). This comparison yielded 50 genes, 24 up and 26 down-regulated in the aggressive group of Lidia individuals (Table 2).
Functional annotation and biological pathway analysis
A GO analysis of the pathways and biological processes identified in the dataset lists containing significant up and down-regulated transcripts was carried out. Among the 918 up-regulated DEGs in aggressive Lidia samples, Panther Over Representation test included 851 uniquely mapped IDs, displaying significant association with 881 GO biological processes (FDR ≤ 0.05), most of them related to heart morphogenesis and heart development, cellular adhesion, migration and differentiation, skeletal and smooth muscle development, central nervous system (SNC) development and function, and immune response (Supplementary Table 5). The Panther Pathway enrichment analysis retrieved five significant pathways: blood coagulation, integrin signaling, Alzheimer disease-presenilin, angiogenesis and gonadotropin-releasing hormone receptor pathways (Table 3A).
Within the down-regulated DEGs in the aggressive cohort, the GO biological processes included 260 genes as uniquely mapped IDs implicated in 243 processes (FDR ≤ 0.05), the highest significant values being dendritic cell cytokine production, trans-synaptic signaling by endocannabinoid, trans-synaptic signaling by lipid, negative regulation of renin secretion into blood stream and melanocyte adhesion, all with 84.4 fold enrichment and two genes associated with each process (Supplementary table 6). The Panther enrichment pathway analysis retrieved two significant down-regulated pathways in the aggressive Lidia breed, both involved in two different types of Heterotrimeric G-protein signaling (Table 3B).
Signaling networks and upstream regulators enrichment analysis
We used the IPA software to identify pathways to which the top DEGs (≥1 WR values) in common with the CSCA belong, as well as to explore the prediction of signaling networks connecting the DEGs.
Significant results are summarized in Supplementary table 5. The most relevant results were obtained under the physiological system development and function and the disease and disorders categories. Within these categories, the top of the list gathered terms related with Nervous system development and function (highest p-value range of 4.10E-08 and 6 DEGs), and Neurological disease (highest p-value range of 6.33E-06 and 5 DEGs), and Psychological disorders (highest p-value range of 6.33E-06 and 3 DEGs) in their respective categories.
The top-scoring regulatory network predicted that 6 DEG; four up (IGF2, COL13A1, RAB3IL1 and SCARA5) and two down-regulated DEGs (ADCYAP1 and BDNF) in the aggressive cohort display interaction with 35 molecules. Two of those 6 DEGs, the up-regulated IGF2 and the down-regulated BDNF interact with most of the network´s molecules (Figure 2). Furthermore, the functional network analyses predicted that16 of this molecules are associated with behavioral function, among them aggressive behavior (p-value 2.99E-05) (Table 4).
Finally, the upstream analysis tool of the IPA package was used to identify the potential upstream regulators that may explain the differential patterns of expression between the up and down regulated DEGs in common with the CSCA in the aggressive cohort. By doing so, five main upstream regulators were identified: Insulin-Like Growth factor 2- Antisense RNA (IGF2-AS; p-value 2.53E-07), Neurotrophic Receptor Tyrosine Kinase 1 (NTRK1; P-value 2.32E-05), Zinc finger BED-Type Containing 6 (ZBDE6; p-value 4.71E-05), RAD21 Cohesin complex component (RAD21; p-value 5.58E-05), and Hedgehog (Hh; p-value 1.03E-04) (Figure 3). All these genes, RNAs and proteins appear to be involved in a heterogeneous array of biological functions related to behavior development and cell-to-cell signaling interactions.
Statistical analysis of aggression-associated differentially expressed genes (DEG)
In order to test whether the 50 DEGs with WR values of 1 or above identified in common with the CSCA, represent a statistically significant association with aggressive behavior, we calculated the cumulative hypergeometric probability of this overlap occurring. Following removal of genes with no known orthologues in cattle from the list of aggression-associated genes, 1,701 genes remained. Of these, 654 had a weighted ranking of 1 or above. Among the 1,196 Lidia DEGs, 1,157 had known one-to-one orthologues with humans, of which 50 were matches among the 654 genes with WR≥1.
Out of the estimated 22,000 genes in the bovine genome [42], the probability of there being 50 or more DEGs among the 654 aggression-associated genes was significantly above chance (p=0.005). When restricting our analysis only to genes likely to be expressed in the cortex based on findings in other mammals―estimated at 85% of protein-coding genes in the genome [43] (18,700 genes in the case of cattle)―the probability of having 50 genes in our intersection was slightly more likely to have occurred by chance (p-value=0.07).
It could be considered that brain-expression studies of aggression in model animals (e.g. mouse, rat, and fox) are most similar in kind to our study. When we took only genes weight-ranked 1 or above that had been identified in previous expression studies (i.e. identified in at least two expression studies, or in one such study, as well as at least one GWAS, selective sweep, knock-out, OMIM, or OMIA) 96 genes remained from our CSCA. Of these 13 were also present among the Lidia DEGs, a number significantly unlikely to have occurred by chance even under the restrictive analysis limiting our total genome population to the estimated 18,700 brain-expressed genes (p-value=0.006). It should be noted that under more permissive analyses, where weighted ranking was not taken into account, all intersections between cattle DEGs and aggression-associated genes were significant, whether considering a genome population size of 22,000 or 18,700 genes, and whether considering all or only brain expression studies. These results confirm Lidia cattle as a valid model for the study of reactive aggression.
In the comparison with the high-frequency mutations and selective sweep studies in archaic humans, modern humans, and domesticated species (thirteen gene sets in total, compiled in [40]), the only significant intersection was between the Lidia DEGs and genes with high frequency regulatory mutations in archaic compared to modern humans. 88 of the 1,157 DEGs with known human orthologues were found among the 1,003 genes with archaic high-frequency regulatory mutations (p-value=0.0005, considering 18,700 as the total gene population size). This remained significant following Bonferroni correction for multiple comparisons (p-value=0.007).