Pathway-based analysis enables deeper mining of GWAS data on H/L in chickens

Background Heterophils are refers to white blood cells with phagocytosis and killing functions in the avian immune system, These cells identify and kill pathogenic microorganisms through precise regulatory mechanisms. As a simple index, the heterophil/lymphocyte ratio (H/L) in the blood reects the immune system status of chickens. H/L is a complex trait affected by multiple genetic loci, but single nucleotide polymorphisms (SNPs) signicantly associated with traits can usually explain only part of the phenotypic variation. Combining a genome-wide association study (GWAS) with pathway analysis can improve understanding of the biological pathways affecting traits. Our objective was to conduct a SNP-and pathway-based analysis to identify possible biological mechanisms involved in H/L traits. Methods GWAS for H/L was performed in 1,317 Cobb broilers to identify signicant single nucleotide polymorphisms (SNPs) associated with H/L. Eight SNPs (P< 1/8,068) reached a signicant level of association. The following results were obtained through a series of analyses, including pathway-based association analysis and analysis of the proportion of phenotypic variance explained by SNPs. Results On the basis of GWAS analysis results, one associated SNP (5% genome-wide signicance (6.80E-6,0.05/8,068)) on GGA 1(chicken chromosome 1) and seven suggestively associated SNPs with a trend toward signicance(1.24E-4, 1/8,068) on GGAs 1, 7, 13 and 19 were detected. The signicant SNP on GGA 19 was in Complement C1q Binding Protein (C1QBP). The wild-type and mutant individuals showed signicant differences in H/L at ve loci (P< 0.05). According to the results of pathway-based analysis, nine associated pathways (P < 0.05) were identied, including small cell lung cancer, proteoglycans in cancer, sulfur relay system, pathways in cancer, gastric acid secretion and purine metabolism. By combining GWAS with pathway analysis, we found that all SNPs


Background
With the continuous increase in intensive agriculture, the poultry industry has developed rapidly. Disease has become a major issue in the poultry industry in China. Vaccination is a common method of disease prevention and treatment, but it has raised concerns regarding expense and meat safety. Disease resistance in chickens can be improved through genetic selection for immunocompetence. Heterophils are found in the bone marrow, blood and connective tissue in poultry, and are the main leukocyte components in poultry. The function of heterophils is similar to that of mammalian neutrophils, which phagocytose and kill invading pathogens and protect the body from invasion [1]. Heterophils play a crucial role in killing foreign pathogenic microorganisms when the speci c immune system of the body is not mature. H/L, the ratio of heterophil to lymphocyte, can be used to study the corresponding immune effects directly at the cellular level. The heterophil/lymphocyte ratio (H/L) in chicken peripheral blood has been widely accepted as a reliable and accurate physiological indicator of the chicken stress response [2].
With high or low temperatures, bacterial infection and other stress reactions, the number of heterophils increases [3,4], H/L can be used as a blood index re ecting the disease resistance of the body [5].
Genome-wide association study(GWAS) can provide preliminary results in genetic research on complex traits in animals. Many single nucleotide polymorphism (SNP) loci and qualitative trait loci(QTL) with strong statistical signi cance have been discovered [6][7][8][9]. GWAS has become a powerful tool for studying important economic traits of livestock and poultry and has promoted research on marker-assisted breeding. However, GWAS cannot be used to identify the causative variation directly association with the traits. The correlation signals that a linked regions. However, pinpointing the causative variation is more di cult. To reveal the mechanism through which a mutation site affects a phenotype and to perform subsequent functional research, GWAS joint analysis on multiple genomic levels can be used and biological pathway analysis can be applied to detect the superposition of multiple minor genes by examining genes involved in the same biological pathway, thus enabling deeper mining of GWAS data [10][11][12][13]. With the development of genome sequencing technology and the continuous improvement of statistical methods, GWAS is expected to be more e ciently applied to gene identi cation for important traits in livestock and poultry and to play an increasingly important role in animal breeding.
Our goal was to identify the candidate genes associated with H/L disease-resistance traits, with the goal of mitigating damage, and to conduct a pathway analysis to complement previously obtained GWAS results for phenotypes associated with H/L.

Ethics statement
The work was approved by the Animal Management Committee of the Institute of Animal Sciences, Chinese Academy of Agricultural Sciences (IAS-CAAS, Beijing, China). Ethical approval regarding animal survival was granted by the animal ethics committee of IAS-CAAS (approval number: IASCAAS-AE20140615).

Phenotypes and breeding value estimation
In chickens at 42 days of age, 10 µL of blood was freshly obtained from the wing. Slides were air-dried and dyed with May-Grunwald-Giemsa stain, and the H/L ratio were calculated. The basic data have been used in recent GWAS analysis, and details regarding the genotyping and phenotyping have been reported [14]. In this study, a conventional pedigree-based best linear unbiased prediction model (BLUP) was used to predict breeding values for each trait.
The pedigree-based BLUP model [15] is where y is the vector of the phenotypic records of the trait (H/L), b is the vector of the xed effect (Batch and Sex), X is the incidence matrix linking b to y, a is a vector of additive breeding values to be estimated, Z is the incidence matrices linking a to y and e is the vector of residuals. We assumed that var(a)=Aσ 2 a , where A is the pedigree-based genetic relationship matrix. The prediction of breeding values(ebv) was performed the Asreml package [16]. The ebv calculated in this part is used for the next GWAS analysis Genotyping and quality control Brie y, 1,317 birds were genotyped with Axiom® SNP arrays [17] with the Affymetrix GeneTitan® system according to the procedure described in the Axiom 2.0 Target Prep 384 Samples Protocol (Affymetrix; Santa Clara, CA, USA) (https://www.thermo sher.com/). Stringent quality control (QC) procedures were performed for the genotype data by using PLINK [18]  Single SNP-based association analysis The basic MLM model was used for association analysis in GAPIT [19]. The P-value was corrected with a strict Bonferroni adjustment based on LD pruning [20]. The sums of the independent LD blocks plus singleton markers were used to de ne the number of independent statistical comparisons [21]. Finally, 8,068 independent SNPs were used to determine the P-value thresholds, including 5% genome-wide signi cance (6.80E-6,0.05/8,068), and suggestive association (1.24E-4, 1/8,068). The Quantile-quantile (Q-Q) plot and Manhattan plots of GWAS for H/L were produced with the "qqman" packages [22] in R. The most signi cant SNP genotypes were added to univariate models to analyze the phenotypic differences among individuals with different SNP genotypes.

Pathway-based association analysis
The pathway-based analysis work ow is represented in Figure 1. In brief, for each trait, nominal P-values < 0.05 from the GWAS analyses were used to identify signi cant SNPs. With VEP software (http://asia.ensembl.org/Tools/VEP), SNPs were assigned to genes if they were within the genomic sequence of a gene or within a anking region of 5 kb up-or downstream of a gene, to include SNPs located in regulatory regions. The Ensembl GRCg6a database was used as a reference (http://asia.ensembl.org/Gallus_gallus/Info/Index). The background SNPs represent all SNPs tested in the GWAS analyses, whereas the background genes were the genes associated with those SNPs. For assignment of the genes to functional categories, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway [23] databases were used. Pathway enrichment analysis was performed with the OmicShare tools, a free online platform for data analysis (www.omicshare.com/tools). Signi cantly enriched pathways in signi cant genes to compared with the background genes were de ned with a hypergeometric test.
Proportion of phenotypic variance explained by SNPs (PVE) PVE estimation is equivalent to calculating the heritability of SNPs, The GBLUP model was used to estimate the phenotypic variance explained by SNPs of all genotyped individuals [24]: where the de nitions of y, X, Z and e are the same as those in the BLUP model, g is the vector of genomic breeding values to be estimated, following a normal distribution of N(0, Gσ g 2 ), in which σ g 2 is the variance of additive genetic effect, and G is the marker-based genomic relationship matrix. Z is the incidence matrix linking g to y, and e is the vector of residuals. e is the vector of random errors, following a normal distribution of N (0, Iσ e 2 ), in which σ e 2 is the residual variance. The estimation of variance components was performed with the Asreml package [16].

Results
Single SNP-based association study In this study, 1,317 birds were genotyped with Affymetrix Chicken 55K genotyping arrays. According to the QC criteria, 1,212 animals and 36,750 SNPs were used in subsequent analyses. A total of 36,750 SNP markers were obtained with principal component analysis (PCA) by using the rst three principal components ( Figure 2). P-value correction was rst performed with PLINKv1.9 software [19] for block analysis. Bonferroni correction was used to correct the P-values, and 8,068 independent linkage disequilibrium (LD) blocks were obtained.
GAPIT was used for traditional SNP-based association analysis. In total, one associated SNP (5% genome-wide signi cance (6.80E-6,0.05/8068)) on GGA1(chicken chromosome 1) and seven SNPs with a suggestively association ( play a role in p53/TP53-mediated apoptosis. The Q-Q plot is presented in Figure 3. The global view of Pvalues (in terms of -log10 (P-value)) for all SNPs is represented by a Manhattan plot in Figure 4. The raw results of all SNPs are described in Additional le 1: Table S1. To effectively identify SNPs, We performed additional analyses, and the SNPs signi cantly associated with H/L and H/L ebv were tted into the model to examine these associations. We suggest that the ve identi ed loci provide the most reliable signal. The substitution of variant wild-type for mutant SNPs led to a signi cant decrease in H/L and H/L ebv. The results for all SNPs are described in Additional le 2: Figure S1.

Pathway-based association study
The same genotyping dataset was used in this analysis. On the basis of the QC and SNP selection criteria (Methods), 335 pathways were selected from the KEGG dataset, covering 13,068 genes. In brief, nominal P-values < 0.05 from the GWAS analyses were used to identify signi cant SNPs. In total, 1,839 SNP (out of the 36750 tested) were located in annotated genes or in the 10 kb window (up-stream or down-stream of a gene). The H/L trait was analyzed, and we identi ed nine associated pathways (P < 0.05): small cell lung cancer, proteoglycans in cancer, calcium signaling pathway, microRNAs in cancer, sulfur relay system, pathways in cancer, gastric acid secretion, purine metabolism and salivary secretion. The Q-Q plot is shown in Figure 5. The detailed information for each pathway is provided in Additional le: Table  S2 and Table S3. Furthermore, we prepared plots of the number of genes for each pathway class to investigate the gene function. The results of gene classi cation con rmed the accuracy of the GWAS analysis (Additional le 4: Figure S2).

Estimation of phenotypic variance explained by SNPs from various datasets
To determine the contribution of SNPs to the phenotypic variation in H/L, we estimated the phenotypic variance explained by SNPs from various datasets (Table S4, Figure 6). We analyzed all SNPs after QC (QC_SNP, 36,750 SNPs) and found that the phenotypic variation that they were able to explain was 12.4%.
The phenotypic variation explained by GWAS-based loci (5 SNPs) and pathway-based loci (48 SNPs) was 4.2% and 9.1%, respectively. The phenotypic variation in H/L that could be explained by SNPs (53 SNPs, GWAS-based and pathway-based) identi ed jointly by the two analysis was as high as 9.7%.

Discussion
Key genes associated with the H/L ratio Heterophils are a special type of white blood cells in poultry, which are responsible for immune defense and immune regulation in the body's innate immune system. Similarly to mammalian neutrophils, heterophils mainly kill pathogens through trapping, phagocytosis, oxidative bursts, cytokine production and other processes, thus protecting poultry from invasion [1]. The process through which phagocytic leukocytes produce reactive oxygen species (ROS) under external stimuli (such as microorganisms and microorganism-related molecules) is called oxidation or the respiratory burst. When the body is infected by pathogenic microorganisms and produces an acute in ammatory reaction, heterophils migrate to the in ammatory site. C1QBP mediates many biological responses in the immune system, including in ammation, infection and immune regulation [25]. Mechanistically, C1QBP is a direct target gene of ZNF32; it inactivates the p38 mitogen-activated protein kinase (MAPK) pathway, thereby exerting the protective effects of ZNF32 against oxidative stress-induced apoptosis. C1QBP is essential for sustaining mitochondrial membrane potential, and it enhances cellular resistance to oxidative stress [26]. C1QBP promotes the release of cytokines such as TNF-α, IL-1β, IL-6, IL-8 and MCP-1 from mononuclear cells [27,28]. The C1QBP-enhanced phagocytic process occurs through the activity of phosphoinositide 3-kinase (PI3K) [29,30]. C1QBP also plays an essential role in CSF-1 induced migration of macrophages [31].
ENSGALG00000041225 is a novel gene, encoding SH3 domain-containing protein. The SRC Homology 3 Domain (or SH3 domain) is a small protein domain of approximately 60 amino acid residues. This domain has been identi ed in several protein families such as PI3K, Ras GTPase-activating protein, CDC24 and CDC25. SH3 domains are found in proteins involved in signaling pathways regulating the cytoskeleton, such as the Ras protein, the Src kinase and many others. Because PI3K and Src kinase are key enzymes in heterophils, this gene may be associated with heterophil function [32].
Signaling pathways associated with the H/L ratio The regulation of the H/L ratio may be a result of complex interactions among multiple pathways. According to the pathway-based analysis, several well-known pathways associated with immune response were found to be enriched (e.g., small cell lung cancer, proteoglycans in cancer, microRNAs in cancer, pathways in cancer, etc.). In addition, other signaling pathways (calcium signaling pathway, sulfur relay system, purine metabolism and salivary secretion) were enriched signi cantly (P < 0.05).
The sulfur relay system pathway, regulated by MPST and URM1, was signi cantly associated with H/L.
According to previous research, MPST catalyzes the transfer of a sulfur ion from 3-mercaptopyruvate to cyanide or other thiol compounds. MPST is required for thiosulfate biosynthesis and it contributes to the catabolism of cysteine and is an important producer of hydrogen sul de(H 2 S) [33,34].  [35]. Both endogenous and exogenous H 2 S inhibits cell proliferation and cytokine release, and inhibits ERK-1/2 and MAPKs, thus resulting in a clear anti-in ammatory response [36]. H 2 S in uences multiple biological functions of cells through inhibiting the PI3K/Akt/mTOR signaling pathway and consequently cell migration, proliferation and cell division [37].

Contribution of SNPs to the variation in H/L
The dissection of the H/L effect was based on the theory of quantitative polygenes micro-effects. H/L is a complex trait affected by multiple genetic loci, but SNPs signi cantly associated with H/L, according to GWAS analysis results, can usually explain only part of phenotypic variation. This partial explanation is primarily because of interactions among genes or between genes and the environment, and the phenotypic variation that can be explained by SNPs is usually much lower than that of heritability in a narrow sense (e.g., the ratio of additive genetic variance to phenotypic variance). In contrast, non-additive effects do not contribute to narrow-sense heritability, mainly because some causative variation sites affecting phenotypes do not reach the threshold of signi cance, or the causative variation is not in LD with SNPs that have been genotyped [38].An other reason is that GWAS considers SNPs above a signi cance threshold, but a lack of SNPs that reach the threshold does not mean that there is no additive effect. We found that all SNPs after QC explained 12.4% of the phenotypic variation in H/L, and the phenotypic variation explained by phenotypic SNPs was between 4.2% and 9.7%. In the H/L GWAS, combined with the pathway correlation analysis, 53 SNPs associated with H/L explained as much as 9.7% of the phenotypic variation in H/L, a value close to the phenotypic variation in H/L that all SNPs were able to explain. This results also precisely indicates that SNPs signi cantly associated with H/L play a major role in the variation in H/L in all SNPs. Simultaneously, correlation analysis based on pathways can meaningfully supplement the results of GWAS analysis.

Conclusion
This study comprised an SNP-and pathway-based analysis to identify genes associated with chicken H/L traits, revealing genes associated with H/L. The gene C1QBP, identi ed to be associated with H/L content, may be an important candidate gene involved in the regulation of H/L. We also provided ve SNPs that may be used as markers in breeding programs. To our knowledge, this is the rst pathwaybased association analysis of H/L traits and GWAS analysis related to H/L ebv. In GWASs and animal breeding, studies have mainly focused on simple associations of SNPs with traits. The pathway-based analysis provided new ndings relative to previous analyses, and con rmed that complex traits are affected by the cumulative effects of several genes clustered in speci c biological pathways. The pathways found to be associated with traits may be useful in further studies involving gene mapping and developing of breeding programs using SNP markers. Our ndings provide new insights into the genetic mechanism underlying H/L.  , and the red line shows genome-wide 5% signi cance with a P-value threshold of 6.80E-6.