Potential Functional Variants in Innate Immune Response Genes Associated with Feed Efficiency in Beef Cattle

Background: Identifying and selecting animals for feed efficiency (FE) is extremely important for the beef production chain. Currently, the most common parameter to access the FE animals is residual feed intake (RFI), which is the residual of the linear regression that estimates DMI based on average daily gain and mid-test metabolic body weight. However, it relies on costly and time-consuming data collection, creating a growing demand for alternative approaches to identify genetically superior animals for FE. This study aimed to detect potential liver-specific functional variants from RNA-seq data of 16 Nellore bulls divergently selected for FE. Results: The variant call analysis detected 247 missense SNPs and nine insertion-deletions (INDELs) that alter the protein functions. These variants were found within 190 genes differentially found (P < 0.05) in liver tissue between high FE (HFE) and low FE (LFE) animals. To better understand the role of these variants in biological pathways, we performed a functional enrichment analysis, which highlighted six genes involved in complement cascade and cascade complement regulation pathways, and 20 genes involved in the regulation of the innate immune system. They had four different significant variants in the complement factor H ( CFH ) family genes, and all were homozygous in HFE animals rather than some degree of heterozygous in LFE animals. Conclusion: We developed a pipeline to detect potential liver-specific functional variants from RNA-seq data from animals divergently selected for feed efficiency. With this approach, we found potential functional variants in innate immune response genes associated with feed efficiency in beef cattle.


Background
Studies on feed efficiency (FE) in beef cattle are very important. The identification and selection of efficient animals can improve the productivity by reducing feed costs, which can reach 75% in feedlot systems (1). FE selection is also justified by the demand for less environmental impact of livestock.
Feed efficient animals are recognized as more sustainable due to decreased production of waste and greenhouse gases.
One of the most common ways to evaluate FE is by residual feed intake (RFI), an independent measure of the level size and growth rate in beef cattle (2). At least five major physiological processes contribute to RFI as intake of feed, digestion of feed, metabolism, physical activity and thermoregulation (3). Thus, the liver is considered a central organ for FE since it is responsible for the metabolism of nutrients as proteins, lipids and carbohydrates, along with other functions as metabolism of bilirubin, bile acids, xenobiotics, protein synthesis and immunity (4). Our group was the first to described a pathophysiological mechanism associated with FE in beef cattle: liver inflammation due to altered metabolism and/or bacterial translocation/infection (5), which was in part corroborated by others (6)(7)(8). We also unravel the metabolic pathways related to FE in Nellore cattle showing an increased bacterial load in low feed efficient animals (LFE) which is in part responsible for the hepatic lesions and inflammation in these animals (9).
The RFI estimation in beef cattle is costly and time-consuming, therefore other approaches to identify genetically superior animals are needed. A usual strategy is the use of molecular markers for genomic selection based on SNPs (single nucleotide polymorphisms) genotyping, which allows the detection of DNA variants associated with FE. However, the majority of the SNPs previously identified in Genome-Wide Association studies (GWAs) and expression quantitative loci (eQTL) association study (10) are non-causal variants (11-13), which are located in intron or intergenic genomic regions. There is an urgent need to establish the functional (causal) variants of this important phenotype. One possibility is the analysis of whole genome sequencing, an approach which is still very costly. Another possibility is the whole exome sequencing that has a reduced cost, but still lack information regarding the importance of the polymorphism within cell and/or tissue architecture. Therefore, we propose an approach to overcome these limitations based on the identification of genetic variants from RNAsequencing (RNA-seq) data from a physiologically phenotype-related organ, followed by a classification of the potential functional variants according to their effects on protein expression and function.
In this work, we developed a pipeline to detect potential liver-specific functional variants from RNAseq data from animals divergently selected for feed efficiency. With this approach, we found potential functional variants in innate immune response genes associated with feed efficiency in beef cattle.

Phenotypic data and biological sample collection
All animal protocols were approved by the Institutional Animal Care and Use Committee of Faculty of Food Engineering and Animal Sciences, University of São Paulo (FZEA-USP -protocol number 14.1.636.74.1).
All procedures to collect the phenotypes and biological samples were carried out at FZEA-USP, Pirassununga, State of São Paulo, Brazil. Ninety-eight Nellore bulls (16 to 20 months old and 376 ± 29 kg BW) were evaluated in a feeding trial comprised of 21 days of adaptation to feedlot diet followed by a 70-day period of data collection. Total mixed ration was offered ad libitum and daily dry matter intake (DMI) was individually measured. Animals were weighted at the beginning, at the end and every two weeks during the experimental period. Feed efficiency (FE) was estimated by residual feed intake (RFI) (2). At the end of the experimental period, liver biopsy samples from the left hepatic lobes were collected from each animal and quickly frozen in liquid nitrogen and stored at -80 °C.
Further information about management and phenotypic measures of the animals used in this study can be found elsewhere (5).

RNA-seq data
Samples of 8 animals from each FE group (high and low) were selected for RNA-seq using RFI measure extremes. The total RNA from liver samples was extracted by using the RNeasy mini kit MAF < 40% and call rate equal to 50%. For the allele frequency test between HFE and LFE groups, the Cochran-Armitage trend analysis was used considering significant differences when P<0.05.

Characterization of the effects of variants on protein sequence and function
The potential functional variants were analyzed in the Variant Effect Predictor (VEP) online tool (17) which predicts the functional effects of the variants. Potential functional variants were analyzed by the Scale-Invariant Feature Transform (SIFT) score, a statistical tool of the VEP online software. SIFT is an algorithm that predicts whether an amino acid substitution affects the function of the protein. This analysis was performed on the basis of the gene sequence homology and the physical properties of the amino acids, where the like sequences are first sought, then strictly related sequences (which may share functions similar to the query sequence), with the alignment of the sequences chosen is possible to calculate the normalized probability for all possible substitutions of the alignment. Thus, generating a SIFT score ranging from 0 to 1, being classified as deleterious values below 0.05 and above 0.05 is considered as tolerated. The deleterious classification indicates that the amino acid change will have a great impact on the proteins, since the tolerated classification indicates that the impact will not be so great. With this information it was possible to compare the position of the variants in the protein with the protein database of the National Center for Biotechnology (NCBI), in order to find information about the name of the region and what function that region performs.

Functional enrichment analysis
The functional enrichment analysis was performed with Panther version 14.1 (18) to identify biological pathways over-represented in the set of genes with potential functional variants. A multiple test correction was used and significant pathways were considered when p<0.05.

Characterization of feed efficiency groups
Two groups of eight animals with extreme values of feed efficiency were selected and named High Feed Efficient (HFE, lowest RFI) and Low Feed Efficient (LFE, highest RFI). These groups were significantly different for feed efficiency traits RFI, feed conversion ratio (FCR), residual weight gain (RWG), residual intake and weight gain (RIG), for dry matter intake (DMI) and average daily gain (ADG) ( Table 1). The LFE animals presented higher backfat thickness at the end of the experiment (P<0.05), supporting that HFE animals are more feed efficient because they eat less, have similar ADG and are leaner than LFE animals (5,19). A total of 11,361 genes were expressed in liver samples from HFE and LFE animals and eight genes were differentially expressed (DE) (P ≤ 0.1): NR0B2, SOD3, RHOB, Bta-mir-2904-2, FTL, CYP2E1, GADD45G and FASN (5).  Fig. 1).

Characterization of expressed SNPs and INDELs associated with feed efficiency
We considered the variants with moderate and high impact as potential functional variants totalizing 256 variants (247 SNPs and 9 INDELs) selected for further analysis. They generated effects on proteins as follows: shorter proteins due to frameshift INDELs, bigger proteins due to insertion of one or two AA by inframe insertion INDELs, changes in single amino acid (AA) on proteins due to missense SNPs and a stop retained INDEL (able to change at least one base of the terminator codon but maintaining the function of the terminator). These 256 potential functional variants were distributed across 190 genes, as some variants were found in spliced isoforms and there were genes with more than one variant (Supplementary table 1).

Depicting the biology of potential functional variants
In order to understand the importance of these 256 potential functional variants we performed the functional enrichment analysis for the selected genes, which detected three different significant pathways (False Discovery Rate -FDR<0.10): "Regulation of Complement cascade", "Complement cascade" and "Innate Immune System" with fold enrichments of 14.79x, 12.19x and 2.57x respectively ( Table 2).
The three pathways associated with FE by our approach were all related to the innate immune system, in particular with the regulation of the complement cascade. The first two pathways enriched from the same set of six genes (Table 3) and the Innate Immune System enriched from twenty genes (Table 3), including the six genes from the complement cascade/regulation of complement cascade pathways. The list of these 20 genes, their potential functional variants, the impact on protein sequence and the frequency in each group can be found in the Supplementary table 2. Considering the "regulation of complement cascade" and "complement cascade" as one pathway since they were enriched for the same set of genes, it calls the attention the overrepresentation of Complement H factor genes (complement factor H-related 5, complement factor H and complement factor H precursor). There were four different significant variants in these genes and all were homozygous in HFE animals instead of some degree of heterozygosity in LFE animals ( Table 4).

Effects of other important potential causal variants
At last, we analyzed the effects of high impact INDELs on protein function since they altered the protein sequence considerably but were not enriched in the processes already described. The

Discussion
There is increasing evidence for the importance of the immune system on feed efficiency in a variety of domestic species. Our group previously showed that feed efficiency is associated with altered inflammatory response in beef cattle partially because of bacterial translocation from the digestive tract to the liver (5,9). Here we reported for the first time the existence of potential functional variants in immune system-related genes associated with feed efficiency, with the potential to alter the function of the innate immune system by regulation of the complement cascade. To achieve these results, we performed a screening for potential functional variants from transcriptomic data of animals evaluated for feed efficiency using RNA-seq data and bioinformatic tools such as GATK, VEP, functional enrichment using Panther and evaluation of effects from the potential variants.
Our pipeline found potential functional variants in genes that significantly enriched for three biological pathways related to the Immune System: "Regulation of Complement Cascade", "Complement Cascade" and "Innate Immune System". Similar genes enriched for Regulation of Complement  (20,21). This enzymatic cascade helps in the defense of infections and is one of the main effectors of humoral immunity, regulating various biological processes such as phagocytosis, opsonization, leukocyte chemotaxis, release of mast cells, basophils and active oxygen species by leukocytes, vasoconstriction, smooth muscle contraction, increased vessel permeability, platelet aggregation and cytolysis (22,23). Activation of this cascade can be performed by 3 pathways (classical, lectin and alternative) and the genes found in our study regulate the alternative pathway, which is triggered in the presence of an exogenous activator, such as the presence of fungi, bacteria, some types of viruses and parasites, which activate C3 molecules and trigger the cascade (24)(25)(26). As already mentioned, we showed in previous works that less feed efficient animals have increased inflammatory response in the liver (5) and there is a higher level of endotoxins in the blood of LFE animals (9) which corroborates with the activation of the complement cascade by the alternative pathway. In fact, another study comparing the hepatic transcriptome of high and low FE animals found the complement system as the most significant canonical pathway enriched from the differential expressed genes (7).
The proteins responsible for regulation of the complement activation (RCA) can be divided into two main groups: membrane-bound regulators and soluble regulators (27). Factor H Complement belongs to the RCA family and we found different alleles predominantly in LFE animals. The CFH also has five additional members, represented by five separate CFH-related genes (CFHR), ranging from 1 to 5. The proteins produced by CFHR contain a set of domains that are homologous to those of CFH (28). These proteins act at the C3 level (29) negatively regulating complement activation, acting as a cofactor for factor I-mediated C3b cleavage and facilitating C3 convertase acceleration of deterioration (28,30). In this work, we found heterozygosity in the LFE animals for the CFH genes whereas the HFE animals were homozygous. Another gene that influences the complement system in all pathways (classical, lectin and alternative) is the C8 gene found in heterozygosity only in LFE animals. The C8 protein is part of the complex membrane attack complex (MAC) that assembles on bacterial membranes to form a pore, allowing the disruption of bacterial membrane organization, cell death and consequently inflammatory response (31,32).
The genetic modulation of the innate immune system supports our previous studies that demonstrate the relationship between feed efficiency and hepatic inflammation (5,9). Along with the genes related to complement cascade regulation, other genes with potential functional variations enriched for the innate immune system as BOLA, ECSIT, GLB1, ADA2, RIPK3, LILRA4, SIRPB1 and YEAR6. Olivieri and colleagues found candidate genes associated with feed efficiency traits in Nellore cattle involved in immune system as well as other processes (33). Specifically, they proposed the NLRP14, which was shown to regulate the innate immune signaling. In humans, a germline mutation in NLRP14 impairs its function and affects the innate immune signaling (34) as well as other polymorphisms in complement genes are linked to human diseases as hemolytic uremia and age-dependent macular degeneration (35), supporting the idea that potential functional variations affect the innate immune system.
Although we haven't found the same candidate genes from the GWAS study of Olivieri and colleagues (2016), our approach confirmed the innate immune system as an important pathway for feed efficiency in beef cattle.
Our initial idea was to identify causal genetic variants for feed efficiency and found these to be present exclusively in one group (LFE or HFE). However, in our study none of the 256 potential functional variants affecting protein sequences were found exclusively in one group and that is the reason why we choose not to call these as causal variants. It is shown for complex traits that even the most important loci in the genome have small effects sizes, and that, together, the significant hits only explain a modest fraction of the predicted genetic variance (36). Indeed, in one of the largest performed GWAS study for feed efficiency in beef cattle, the authors found ten significant quantitative trait locus (QTL) for RFI but none explained more than 2.5% of the additive genetic variance in any population (12).
We are aware that one possible limitation of the present work is the sample size but even in the present condition the results are well supported by the literature in transcriptomic (5)(6)(7) and GWAs studies (33), including other species as pigs (37,38) and poultry (39). Therefore, the existence of potential functional variations in genes of the innate immune response affecting feed efficiency in beef cattle seems plausible and worth future studies.

Conclusion
In this research, we found possible hepatic causal variants that modulate the expression/function of genes involved in the pathways innate immune response. Having as one of the main findings, four applicable homozygous variants for HFE animals in the CFH and CFH5 genes, which are responsible for phagocytosis of microbes and damaged cells that induce infection. Occasional changes in these variants regulate the alteration of immunity-related biological pathways, may alter or modify the model, which in our case is economical.

Availability of data and materials
The dataset supporting the conclusions of this article is available in the ArrayExpress database

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.