Genome-wide association studies for immunoglobulin concentrations in colostrum and serum in Chinese Holstein

Background The early death and health problems of calves caused substantial economic losses in the dairy industry. As the immune system of neonates has not been fully developed, the absorption of maternal immunoglobulin (Ig) from colostrum is essential in protecting newborn calves against common disease organisms in their early life. The overwhelming majority of Ig in bovine whey is transported from the serum. Therefore, Ig concentration in the colostrum and serum of dairy cows are critical traits when estimating the potential disease resistance of its offspring. Results Colostrum, blood, and hair follicle samples were collected from 588 Chinese Holstein cows within 24 h after calving. The concentration of total IgG, IgG1, IgG2, IgA and IgM in both colostrum and serum were detected via ELISA methods. With GCTA software, genome-wide association studies (GWASs) were performed with 91,620 SNPs genotyped by GeneSeek 150 K (140,668 SNPs) chips. As a result, 1, 5, 1 and 29 significant SNPs were detected associated with the concentrations of colostrum IgG1, IgG2, IgA IgM, and serum IgG2 at the genome-wide level (P < 3.08E–6); 11, 2, 13, 2, 12, 8, 2, 27, 1 and 4 SNPs were found significantly associated with total IgG, IgG1, IgG2, IgA and IgM in colostrum and serum at the suggestive level (P < 6.15E–5). Such SNPs located in or proximate to (±1 Mb) 423 genes, which were functionally implicated in biological processes and pathways, such as immune response, B cell activation, inflammatory response and NF-kappaB signaling pathways. By combining the biological functions and the known QTL data for immune traits in bovine, 14 promising candidate functional genes were identified for immunoglobulin concentrations in colostrum and serum in dairy cattle, they were FGFR4, FGFR2, NCF1, IKBKG, SORBS3, IGHV1S18, KIT, PTGS2, BAX, GRB2, TAOK1, ICAM1, TGFB1 and RAC3. Conclusions In this study, we identified 14 candidate genes related to concentrations of immunoglobulins in colostrum and serum in dairy cattle by performing GWASs. Our findings provide a groundwork for unraveling the key genes and causal mutations affecting immunoglobulin concentrations in colostrum and important information for genetic improvement of such traits in dairy cattle. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08250-5.


Background
The early survival rate and health of calves are important factors affecting the production efficiency of the dairy industry. It was reported that approximately 31% of preweaning mortality events occurring in the first 3 weeks of life were attributed to the low serum IgG concentration of calf (less than 10 mg/mL when sampled between 24 and 48 h of age) [1,2]. Indeed, the immune system of neonates has not been fully developed depending almost entirely on the transport of maternal immunoglobulin (Ig) from colostrum after birth. Hence, the absorption of colostrum Ig during the first 24 h after birth is essential for the health and survival of the neonatal calf.
Immunoglobulins are the major protein components of colostrum, comprising 70-80% of the total protein content, while in mature milk, immunoglobulins constitute only 1-2% [3,4]. There are three major immunoglobulins in bovine serum and milk: IgG, IgM and IgA, with IgG consisting of two subclasses (IgG1 and IgG2). IgG1 accounts for over 75% (46.4 mg/ml), and IgM (6.8 mg/ml), IgA (5.4 mg/ml) and IgG2 (2.9 mg/ml) are followed successively [3]. Immunoglobulins are produced by B1-cells and possess a multitude of functions such as activate complement-mediated bacteriolytic reactions, augment the recognition and phagocytosis of bacteria by leucocytes (opsonization), prevent the adhesion of microbes to surfaces, inhibit bacterial metabolism, agglutinate bacteria, and neutralize toxins and viruses [5]. Bovine colostrum immunoglobulins are notably transported from the serum and accumulated in the mammary gland during the prepartum dry period [6,7]. Hence, delineation of the genetic architecture underlying the concentrations of immunoglobulins in cows' colostrum and serum is important for identifying ways to improve the survival rate of neonatal calves in dairy cattle.
Genome-wide association studies (GWASs) have been performed for immunoglobulins in serum or mature milk. The first GWAS based on 2247 individuals from four European cohorts (CROATIA-Vis, CROATIA-Korcula, Orkney Complex Disease Study and Northern Swedish Population Health Study) identified 9 genome-wide significant loci associate with IgG glycosylation and 4 out of them contained genes encoding glycosyltransferases [17]. Another GWAS for IgG glycosylation patterns in humans indicated that RUNX family transcription factor 3 (RUNX3) was associated with decreased galactosylation and involved in both IgA class switching and B-cell maturation as well as T-cell differentiation and apoptosis [18]. In pigs, 2 genome-wide and 4 chromosome-wide significant SNPs were detected for IgG blocking percentage to CSF virus in serum by performing GWAS [19]. Especially, a GWAS for blood natural antibodies in Canadian Holstein cows identified 23 SNPs that were significantly associated with IgG concentration at genome-wide level [20]. Another GWAS for milk natural antibodies in Dutch Holstein-Friesian cattle identified some significant SNPs for IgG1 and IgM with candidate genes on Bos taurus autosome (BTA) 3, 17, 18, and 21 that related to immunoglobulin structure and early B cell development [21]. However, there are few studies on gene identification for immunoglobulin concentrations in colostrum in dairy cattle so far. In addition, in bovine colostrum, the immunoglobulins were found mainly derived from serum [6,7]. Hence, investigation on the concentrations of immunoglobulins in both colostrum and serum can better disentangle the genetic architecture underlying colostrum immunoglobulin traits. Here, we conducted genome-wide association studies for the concentrations of immunoglobulin components in colostrum and serum in a Chinese Holstein population to identify the functional genes that contributed to the phenotypic variation of colostrum immunoglobulins and provide molecular information for genetically improving such traits to increase calves' disease-resistance.

Genome-wide association study
After LD analysis, a total of 16,257 effectively independent tests number were suggested. Thus, the threshold P-value for genome-wide significant association was set at 3.08E-6 (0.05/16,257) and that for suggestive significant association was 6.15E-5 (1/16,257) [22]. Based on the QQ plots (Figs. 1 and 2) and the estimated inflation factor (λ) of 0.98-1.03 for all traits, no population stratification was observed.

Candidate genes and function analysis
After comparing to the reference genes (UMD 3.1), a total of 423 genes that contained or were adjacent to (± 1 Mb) the significant SNPs were mapped, including 392 protein-coding genes, 30 non-coding RNAs and 1 pseudogene (Additional file 1: Table S1).
To further investigate the biological functions of these candidate genes, we performed GO and KEGG analysis and observed that 73 genes were enriched in immunerelated biological processes and pathways such as adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains, immune response, B cell activation, inflammatory response, and NF-kappaB signaling pathways (Additional file 2: Table S2). Simultaneously, we compared the physical positions of the 423 genes with the peak of the known QTLs that have been shown associated with immune capacity in dairy cattle (Cattle QTLdb), including IgG level, FMDV peptide-induced cell proliferation, ConA-induced cell proliferation and Clinical mastitis. Consequently, 226 genes were found located within the QTL regions with a distance to the peak positions of less than 1.0 cM.

Discussion
In this study, we identified the chromosome regions related with immunoglobulin concentrations in colostrum and serum in dairy cattle by performing GWASs with high density SNP genotypes. Consequently, we detected 19, 5, 74, 4 and 16 significant SNPs associated with the total IgG, IgG1, IgG2, IgA and IgM, respectively. To our knowledge, this is the first investigation on the genetic architecture of colostrum immunoglobulins in dairy cattle.
In general, a genomic inflation factor λ of < 1.05 suggests no population stratification [23]. In this study, the calculated λ values ranged from 0.99 to 1.03 for concentrations of Ig concentration in colostrum and serum, suggesting population stratification was well controlled.
In the present study, the significant SNPs associated with the concentration of IgG2 in colostrum and serum were almost entirely distributed on BTA21 from 63.3 to 71.5 Mb. Similarly, two previous GWASs in Canadian and Dutch Holstein populations observed that the significant SNPs for IgG in serum and IgG1 in mature milk were mainly located in BTA21 from 55.5 to 70.6 Mb and 66.0 to 71.6 Mb, which contained the region identified in our study [20,21]. The previous studies revealed that the main locus of bovine immunoglobulin heavy chain variable genes was located on approximately 71.5 Mb of BTA21 [24,25], indicating this region may be related to the formation of immunoglobulin. Concurrently, the significant SNPs for the total IgG and IgG1 concentrations in colostrum and serum distributed on multiple chromosomes, including BTA2, 3, 4, 6, 9, 11, 18, 19, 22 and 30, which is inconsistent with the previous two GWASs for IgG in serum and IgG1 in mature milk. Such inconsistency was most likely due to the huge difference of Ig formation mechanism and concentration between colostrum and mature milk. The majority of bovine colostrum Ig was transported from serum and accumulate in the mammary gland during the prepartum dry period, under the influence of prolactin and ceases abruptly at parturition, resulting in 200 times difference between Ig concentration in colostrum and mature milk [26]. Furthermore, 2 significant SNPs associated with IgM concentrations in colostrum and serum were detected on 73.3 and 74.2 Mb of BTA17, very close to 2 significant SNPs on BTA17 (72.5 to 73.6 Mb) identified for IgM in mature milk in a previous GWAS in Dutch Holstein populations [20]. The remaining significant SNPs were first reported in this study.
Combing the biological functions of the 423 functional genes that contained or were closed to the significant SNPs with less than 1 Mb and the known QTL data for immune traits in bovine,14 promising genes were identified for Ig. Of these, 2, 2, 3 and 1 candidate genes were selected for the total IgG, IgG1, IgG2 and IgM concentration in colostrum, respectively. FGFR2 and FGFR4 belong to the fibroblast growth factor receptor family which has been shown to mediate pro-inflammatory signaling in the liver and airway epithelium in chronic obstructive pulmonary disease [27]. NCF1 encodes a cytosolic subunit of neutrophil NADPH oxidase, an enzyme responsible for reactive oxygen species (ROS) production, which is pivotal in both host defense and the control of inflammation [28,29]. IKBKG encodes the regulatory subunit of the inhibitor of kappaB kinase (IKK) complex, which activates NF-kappaB resulting activation of genes involved in inflammation and immunity [30,31]. SORBS3 encodes an SH3 domain-containing adaptor protein that regulates cell adhesion and signal transduction. The deficiency of adaptor protein could suppress vascular inflammation and inactivate Akt-nuclear factor κB signaling [32]. IGHV1S18 is an immunoglobulin heavy chain variable region that encodes Ig heavy chain and is directly related to the formation of immunoglobulins. KIT encodes a receptor tyrosine kinase that is associated with the earliest neutrophil developmental stages [33]. PTGS2 could activate the NF-κB signaling pathway which plays a key role in regulating the immune response to infection [34]. Simultaneously, 2, 1, 1 and 2 candidate genes were opted for the concentration of IgG, IgG1, IgG2 and IgM in serum, respectively. Of these, BAX belongs to the BCL2 protein family which could regulate B cell homeostatic proliferation and apoptotic process [35]. GRB2 encodes growth factor receptor-bound protein 2, which could regulate B-cell maturation, B-cell memory responses and inhibits B-cell Ca2 + signaling [36]. Thousand and one kinase 1 (TAOK1) could as a negative regulator of IL-17 to mediate signal transduction and inflammation, controlling colitis of inflammatory bowel disease [37]. ICAM1 encodes a cell surface glycoprotein which is typically expressed on endothelial cells and cells of the immune system. Upregulation of ICAM1 in a mechanism involving NF-қB could inhibit the Epstein-Barr virus infection [38]. The expression level of TGFB1 was associated with melanoma immune response [39]. The protein encodes by RAC3 is a member of the p160 family of nuclear receptor coactivators that plays an important role in NF-kappaB activation [40].
Generally, all these genes played vital roles in the inflammation, neutrophil activation, resistance to viruses, NF-kappaB, B cell homeostasis and immunerelated process, which indicated the potentially important roles of Ig in colostrum and serum in resistance to infectious diseases.
In the present study we identified 8 and 6 first-time candidate genes for immunoglobulins in dairy cattle  colostrum and serum, respectively. From the breeding perspective, our findings provide important molecular information for the genetic improvement program on health and disease-resistance traits in dairy cattle. On the other hand, as the absence of biological validation and the measurements of serum Ig in the offspring, further in-depth investigations are needed to better understand the genetic mechanisms on how these genes regulated and impacted the formation of immunoglobulins in colostrum before applying them on the breeding of dairy cattle.

Conclusion
In this study, we conducted genome-wide association studies for the concentrations of immunoglobulins in colostrum and serum in Chinese Holstein.

Animals and phenotypes
The animals used in this study consist of 588 Chinese Holstein cows daughters of 44 sires from 10 dairy farms in the Beijing Dairy Cattle Center and the Beijing Sunlon Livestock Development Company Limited. The pedigree contained 1839 animals and was provided by the Beijing Dairy Cattle Center. The average number of daughters per sire was 13.4. Cows ranged from parity 1 to 4 (mean = 2.52). The blood serum and colostrum samples were taken from each cow during the first milking within 24 h after calving for measurement of immunoglobulins. Hair follicle samples were collected from each animal for SNP chip genotyping as well. The whole procedure for collection of the samples (blood, hair and colostrum) was implemented in strict accordance with the protocol approved by the Animal Welfare Committee of China Agricultural University (Permit number: DK996). The animals used in this study were all released to their own population for normal production after sample collection. The concentrations of immunoglobulins of each colostrum and serum sample were measured, including total IgG (Bovine IgG ELISA Quantitation Set E10-118, Bethyl Laboratories, Montgomery, TX, USA), IgG1 (Bovine IgG1 ELISA Quantitation Set, E10-116), IgG2 (Bovine IgG2 ELISA Quantitation Set, E10-117), IgA (Bovine IgA Quality control was conducted on PLINK 1.90 software and the filtering processes were as follows: firstly, samples with all SNPs genotyping rate < 95% were deleted; then, SNPs with call rates < 90%, minor allele frequencies (MAF) < 0.1 and Hardy-Weinberg equilibrium (HWE) p-values < 10-6 were discarded [41,42]. Thus, 563 individuals with 91,620 SNPs were kept for further analysis (Additional files 3 and 4).

Statistical analysis Mixed Model based single locus Regression Analyses (MMRA)
We performed single-SNP association analysis for the individual phenotype in GCTA 1.90.2 with the following mixed linear model: Where y is a vector of transformed phenotypes (the concentration of IgG, IgG1, IgG2, IgA and IgM in colostrum and serum) of all cows; μ is the overall mean; f is the vector of fixed effects, including herd (classes: 1 to 10), parity (classes: 1 = parity 1, 2 = parity 2, 3 = parity 3 and 4 = parity 4) and season of calving (classes: 1 = March to May, 2 = June to August, 3 = September to November and 4 = December to February), X is an incidence matrix relating elements of f to y; c is the vector of the SNP genotype indicators which take values 0, 1 or 2 corresponding to the three genotypes 11, 12 and 22 (assuming 2 is the allele with a minor frequency), b is the regression coefficient of y on c; g is the vector of residual polygenic effects with g ~ N (0, Gσ g 2 ) (where G is the genomic relationship matrix and σ g 2 is the additive variance), Z is the incidence matrix of g; e is the vector of residual errors with e ~ N (0, Iσ e 2 ) (where I is the indentity matrix and σ e 2 is the residual variance). The heritability estimation were carried out by GCTA 1.90.2 software.
The existence of linkage disequilibrium (LD) of SNPs in every chromosome may lead to over-correction when using Bonferroni adjustments [41,43]. Hence, we used an y = 1µ + Xf + bc + Zg + e effectively independent test number to define the thresholds for genome-wide/suggestive significant associations based on the assessed number of independent markers and linkage disequilibrium blocks for markers on every chromosome [22].
Population stratification can result in spurious association findings in a GWAS [42]. Thus, we calculated the genomic inflation factor (λ) and depicted quantile-quantile (Q-Q) plot to assess stratification in our study population using qqman packages in R 3.6.0.

Identification of candidate genes
To further identify the candidate genes associated with the concentrations of immunoglobulins, we selected the functional genes that contained or were adjacent to the significant SNPs with less than 1 Mb based on the bovine gene set in RefSeq database (Bos_taurus_UMD_3.1; http:// hgdow nload. cse. ucsc. edu/ golde nPath/ bosTa u6/ datab ase/). Additionally, to figure out the biological functions of these genes, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment were implemented with DAVID Bioinformatics Resources (https:// david. ncifc rf. gov). In addition, we also compared the physical position of these functional genes with the reported quantitative traits loci (QTLs) for immune capacity traits in the Cattle QTL database (https:// www. anima lgeno me. org/ cgi-bin/ QTLdb/ BT/ index).