Genome-Wide Association Studies for Immunoglobulins in Colostrum and Serum in Chinese Holstein

CRK, MYO1C, RILP, SERPINF2, AKT1, BCL11B, HHIPL1, DYNC1H1, HSP90AA1, TRAF3, KLC1, IL6, PYCARD, ITGAM, TGFB1I1, GUSB, CRCP, RABGEF1 and SBDS. In this study, we identied 21 candidate genes associated with immunoglobulins level in colostrum and serum in dairy cattle. This founding demonstrated the possibility of increasing immunity through selective breeding and provided an important information for molecular breeding of dairy cattle.


Introduction
Maternal antibodies, especially immunoglobulin (Ig) G, are critical to early survival of domestic animals [1]. In dairy cattle, it is important for calves to gain maternal antibodies in the rst 24 hours after born because the autoimmune system of newborn calves is too weak to resist various diseases.
According to the method for transmission of immunoglobulin from mother to young , mammals are divided into three categories, including only in utero (human and rabbit), only after birth (pig, horse, sheep and cow), or by both mechanisms (dog, rats and mice) [2]. Cow serum and lacteal secretions contain three major classes of immunoglobulin antibodies: IgG, IgM and IgA. In a newborn calf, the IgG, IgM and IgA are absorbed from the colostrum into the circulation within 24-36 h after birth via a non-selective macromolecular transport system [3].
IgG molecule in dairy cattle occurs predominantly in two subclasses: IgG1 and IgG2. In the colostrum of cow which is de ned as the secretion from the mammary gland during the rst 24 h after calving, immunoglobulins make up 70-80% of the total protein content, whereas in mature milk, immunoglobulins account for only 1-2% of the protein [4,5]. IgG1 comprises over 75 % (46.4 mg/ml) of the immunoglobulin antibodies in colostral whey, followed by IgM (6.8 mg/ml), IgA (5.4 mg/ml) and IgG2 (2.9 mg/ml) [4].
Immunoglobulins are components of the innate immune system that can be found in animals without prior exposure to antigens, which have broad speci city [6]. Immunoglobulins participate principally the secondary immune response and possess a multitude of functions such as activate complement-mediated bacteriolytic reactions, augment the recognition and phagocytosis of bacteria by leucocytes (opsonization), prevent the adhesion of microbes to surfaces, inhibit bacterial metabolism, agglutinate bacteria, and neutralize toxins and viruses [7].
Previous GWASs have been performed for immunoglobulins content in serum or mature milk, the rst study of these in 2,247 individuals from four European cohorts (CROATIA-Vis, CROATIA-Korcula, Orkney Complex Disease Study and Northern Swedish Population Health Study) identi ed 4 loci encoding glycosyltransferases associated with IgG N-glycans [17]. The GWAS for immunoglobulin G glycosylation patterns in human indicated that RUNX family transcription factor 3 (RUNX3) was associated with decreased galactosylation and involved in both IgA class switching and B-cell maturation as well as T-cell differentiation and apoptosis [18]. In pig, 4 signi cant SNPs were identi ed by GWAS performed for four traits, including interferon-gamma (IFN-c) and interleukin 10 (IL-10) levels, the ratio of IFN-c to IL-10 and IgG blocking percentage to CSFV in serum [19]. Especially, a previous GWAS for blood natural antibodies in Canadian Holstein cows found the 23 SNPs were signi cantly associated with IgG [12]. Another GWAS for milk natural antibodies in Dutch Holstein-Friesian cattle identi ed candidate genes on chromosome 17, 18, and 21 that related to immunoglobulin structure and early B cell development [20]. However, no study on immunoglobulin in colostrum have been reported in dairy cattle so far. In the present study, our objective is to identify candidate genes related to contents of immunoglobulins in colostrum and serum in dairy cattle and provide molecular information for genetically improving calves' disease resistance.

Animals and phenotypes
The animals used in this study consists of 620 Chinese Holstein cows, the daughters of 44 sires with average 13.78 cows per sire. All the cows were from 10 dairy farms in the Beijing Dairy Cattle Center and the Beijing Sanyuan Lvhe Dairy Farming Center. The blood serum and colostrum samples were taken from each cow during the rst milking within 24 h after calving for measurement of immunoglobulins. Also, hair follicle samples were collected from each animal for SNP chip genotyping. The cows were on average parity of 2.52, ranging from 1 to 4. The whole procedure for collection of the samples (blood, hair and colostrum) was carried out in strict accordance with the protocol approved by the Animal Welfare Committee of China Agricultural University (Permit number: DK996).

Data imputation and quality control
To make most use of SNPs from high density chip, the total 32 samples genotyped with 50k chip were imputed to 150k chip based on our previously developed reference population that included 1198 Chinese Holstein cows by using the BEAGLE 3.0.4 software [21]. Allelic R 2 was estimated as an indicator of imputation accuracy based on the genotype probabilities.
Then we performed PLINK [22] for SNPs quality control and removed the SNPs with call rates < 90%, minor allele frequencies < 0.1, a deviation from Hardy-Weinberg equilibrium (HWE) P values < 10 -6 and > 10 % missing genotypes [23]. After considering all the quality control measures, totally 88,934 SNPs were included in the genome-wide association study (Additional les 1 and 2). All SNP positions were determined according to the Bos taurus UMD 3.1 genome assembly.

Statistical analysis
Mixed model based single locus regression analyses (MMRA) The association of the individual phenotype with each individual SNP was performed via following a mixed linear model: y = Xβ +g + Aσ g 2 +Iσ ε 2 Where y is an n × 1 the vector of root square or log transformed corrected concentration immunoglobulins of all cow after calving with n being the sample size of 619; β is the vector of xed effects, including herd, parity and season of calving, X is an incidence matrix relating elements of β to y; g is an n × 1 vector of the total genetic effects of the individuals with g ~N (0, Aσ g 2 ), and A is interpreted as the genetic relationship matrix (GRM) between individuals. We can therefore estimate σ g 2 by the restricted maximum likelihood (REML) approach [24], relying on the GRM estimated from all the SNPs. I is an n × 1 identity matrix, and ε is a vector of residual effects with ε~N (0, Iσ ε 2 ).
GWAS was carried out by GCTA 1.90.2 software [25]. GCTA is the method which can estimate variance explained by all SNPs, and extend the method to partition the genetic variance onto each of the chromosomes and also to estimate the variance explained by the X chromosome.
In order to de ne the thresholds for genome-wide signi cant/suggestive associations, we calculated the effectively independent tests based on the estimated number of independent markers and linkage disequilibrium blocks for markers [26]. A linkage disequilibrium block was de ned as a set of adjacent SNPs with pairwise r 2 values greater than 0.40 [27].
A total of 15,820 effectively independent tests was recommended and the threshold P-value for genome-wide signi cance association was 3.16E-6 (0.05/15,820) and for suggestive association was 6.32E-5 (1/15,820) suggested by Lander & Kruglyak [26,28]. In ation or de ation in p-values due to strati cation or family structure was assessed by genomic in ation factor (λ) and also visually inspected by quantile-quantile (Q-Q) plot. λ is calculated as the median of the χ2 test statistics (1 degree of freedom) divided by its theoretical median under the null distribution [29]. The λ was calculated using GENABEL packages of R.

Haplotype block analysis
To further detect candidate regions associated with the immunoglobulins contents in colostrum and serum, we performed linkage disequilibrium analysis for the chromosomal regions with adjacent multiple signi cant SNPs based on Haploview v4.2 [30]. The block was de ned according to the solid spin algorithm by the criteria of Gabriel [31].

Identi cation of candidate genes and functional annotation
To identify the positional candidate genes that are potentially associated with the levels of immunoglobulins in colostrum and serum, the genomic regions that were close to the signi cant SNPs with a distance less than 1 Mb were selected. These regions were then referenced against the bovine genome assembly (UMD 3.1, Ensemble Genes Release 92) using the BioMart tool (http://asia.ensembl.org/biomart/martview) to found genes that are located in the vicinity of the signi cant SNPs. Simultaneously, we performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis for these extracted genes by KOBAS 3.0 [32]. KOBAS annotates a set of genes with putative pathways and immunity relationships by mapping to genes with a known annotation.
Afterwards, we compared the physical positions of the identi ed candidate genes with the reported quantitative traits loci (QTLs) that have been shown to be associated with immune capacity traits for cattle in the Animal QTL database (http://www.animalgenome.org/cgibin/QTLdb/index).

Statistics of phenotype and SNP data
In this study, we analyzed 10 traits, including concentration of IgG, IgG1, IgG2, IgA and IgM in colostrum and serum. Mean and the corresponding standard deviations for the original and corrected phenotypic values with or without square root or log transformation were shown in Table 1.

Genome-wide association study
For the genotype imputation with the software Beagle 3.0.4, SNPs with R 2 > 0.3 were retained [33] and the average allelic R 2 of imputed genotypes was 74.8% , which made sure the accuracy of imputation and full use of chip data. With GCTA 1.90.2, no population substructure was observed form QQ plots ( Figures. 1). In addition, the in ation factor (λ) estimated was from 0.98-1.02 for all traits, which indicated that our results can be accepted for further analysis.
For the contents of immunoglobulins in colostrum, as shown in Figure. 2 and Table 2 Table  S1). Out of these, with GO and KEGG analysis, 151 genes were observed participated in immune related pathways such as immune response, Fc gamma R-mediated phagocytosis, negative regulation of immunoglobulin secretion, humoral immune response, Fc-epsilon receptor and NF-kappaB signaling pathways, etc (Additional le 4: Table S2).
Based on the Cattle QTL database that has released 232 loci for immune capacity until now (April 26, 2020, http://www.animalgenome.org/cgi-bin/QTLdb/), we compared the physical positions of the 151 candidate genes with the peak of the known QTLs for immune capacity in dairy cattle, including IgG level, FMDV peptide-induced cell proliferation and ConA-induced cell proliferation. Consequently, 21 genes were found located within the QTL regions with distance to the peak positions of less than 1.0 cM so that they were considered as promising candidates for immunoglobulins level in serum and colostrum (Table 4)

Discussion
In this study, we identi ed candidate region in uencing immunoglobulins in colostrum and serum by performing GWASs with high density SNP genotypes. We found 3 signi cant regions on 21, 10 and 17 chromosome and 78 suggestive signi cant SNPs on other 20 chromosomes associated with IgG, IgA and IgM. To our knowledge, this is the rst investigation focusing on unrevealing the genetic mechanism of these immune traits in bovine colostrum based on a high-density SNP chip panel.
Population stratification is a major factor for false positives in GWAS. In general, a genomic inflation factor λ of <1.05 indicates no population stratification [34]. In this study, λ values were 0.98~1.02 for concentration and ratio of IgG, IgM and IgA in colostrum and serum. QQ plots indicated that population stratification has been controlled as well. In addition, we calculated the effectively independent tests based on the estimated number of independent markers and linkage disequilibrium blocks for autosome markers, and a total of 15,820 effectively independent tests was suggested. This setting of thresholds may lead slight false positive but numerous useful information.
In this GWAS, we found a region on BTA-21 from 63.9 to 71.5Mb that was signi cantly associated with for IgG2. Previous GWASs on bovine nature antibodies in Canadian and Dutch Holstein populations identi ed genomic regions on BTA-21 from 55.5 to 70.6 Mb and 66.0 to 71.6 Mb that were signi cantly associated with KLH-IgG and PGN-IgG1 [12,20], which overlapped with region in this study. Although our samples were taken from colostrum and serum, whereas theirs were from serum and mature milk, and we analyzed IgG, IgG1 and IgG2 while they analyzed IgG and IgG1, it is likely due to the high genetic correlation between the same IgG isotype in serum and milk (0.81±0.18) [13]. This implies that the all IgG isotype subclasses from serum or milk shared the same genetic mechanism.
For IgM, although its heritability was relatively high, we detected 10 and 5 SNPs associate with concentration of IgM in colostrum and serum, respectively, we did not identi ed any candidate genes for this trait. Also, the identi ed signi cant SNPs for IgM in colostrum and serum were totally different. Similar results were reported in a previous GWAS by de Klerk et al. [12]. As for the other genomic regions we identi ed for immunoglobulins have not been reported in previous studies in bovine.

Role of Candidate Genes
For IgG1 in colostrum and serum, 6 candidate genes were identi ed and from 22.3~23.4 Mbp on BTA-19. ABR, encodes protein that contains a GTPase-activating protein domain. Previous studies identi ed ABR and BCR as the only GTPase-activating proteins that speci cally negatively regulate Rac function in vivo in primary macrophages and normally curb very speci c functions of mature tissue innate immune cells [35,36]. The protein encoded by TIMM22 is a subunit of TIM22 which is transporters and represents the voltageactivated and signal-gated channel. TIMM22 is involved in protein transmembrane transporter activity, which may relate to the transport of IgG though the receptor. CRK encodes a member of an adapter protein family that binds to several tyrosine-phosphorylated proteins. Differential migration of CRK/CRKL-de cient T cells resulted in e cient graft-versus-leukemia responses in mice [37]. MYO1C encodes a member of the unconventional myosin protein family which play a role in bacterial infectious disease and Pathogenic Escherichia coli infection. RILP encodes a lysosomal protein that is related to the phagosomes in macrophages [38]. SERPINF2 encodes a member of the serpin family of serine protease inhibitors. The protein is a major inhibitor of plasmin, which degrades brin and various other proteins. SERPINF2 is involved in the recruitment of lymphocytes in the peripheral tissues and the progression of brosis. Besides, SERPINF2 contributes to the maintenance of immunological functions that are related to IgE in human [39].
For IgG2, totally 7 candidate genes were found on BTA-21 from 65.8~70.9Mbp. AKT1, is part of the PI3K-AKT pathway and involved in B cell receptor, T cell receptor, Fc gamma R-mediated phagocytosis pathway. Some of the known functions include T cell development and FOXO1 signaling, which involves maturation and survival of peripheral B cells and class switching [40][41][42]. It is worth mentioning that AKT1 was also identi ed as candidate gene for IgG1 in a previous study [20]. BCL11B encodes a C2H2-type zinc nger protein that regulates the initial stages of human T-cell differentiation and mediated epigenetic repression for histone deacetylase inhibitors in cutaneous T-Cell lymphoma [43][44][45]. HHIPL1 belongs to the glucose/sorbosone dehydrogenase family and take part in receptor-mediated endocytosis so that it can effect receptor activity. DYNC1H1 encodes a member of the cytoplasmic dynein heavy chain family that function as molecular motors and play a role in Salmonella infection. The protein encoded by HSP90AA1 is an inducible molecular chaperone that functions as a homodimer. Hsp90 was related to beta cell autoimmunity and protected cells from complement-dependent cytotoxicity by inhibiting, together with mortalin, C5b-9 assembly and/or stability at the plasma membrane [46,47]. The protein encoded by TRAF3 is a member of the TNF receptor associated factor (TRAF) protein family that participates in the signal transduction of CD40, a TNFR family member important for the activation of the immune response and was found to be a critical component of the lymphotoxin-beta receptor (LTbetaR) signaling complex, which induces NF-kappaB activation and cell death initiated by LTbeta ligation [48,49]. The ndings suggest that Parkin plays a novel role in innate immune signaling by targeting TRAF3 for degradation and maintaining the balance of innate antiviral immunity [50]. A TRAF3-NIK axis differentially regulates viral DNA vs RNA pathways in innate immune signaling [51]. KLC1 encodes a member of the kinesin light chain family. KLC1 is associated with infectious disease caused by bacterial. Additionally, 2 and 4 signi cant SNPs for the ratio of IgG1 and IgG2 in colostrum and serum but no candidate gene were selected, which might attribute to the less transmission but most self-synthetization.
For IgA, we selected 7 candidate genes. PYCARD, encodes an adaptor protein that is composed of two proteinprotein interaction domains: a N-terminal PYRIN-PAAD-DAPIN domain (PYD) and a C-terminal caspaserecruitment domain (CARD). The PYD and CARD domains are members of the six-helix bundle death domainfold superfamily that mediates assembly of large signaling complexes in the in ammatory and apoptotic signaling pathways via the activation of caspase. In normal cells, this protein is localized to the cytoplasm. Han Y et.al found that PYCARD as a negative regulator of the MAVS-mediated innate immunity, may play an important role in host protection upon virus infection [52]. ITGAM gene encodes the integrin alpha M chain. The alpha M beta 2 integrin is important in the adherence of neutrophils and monocytes to stimulated endothelium, and also in the phagocytosis of complement coated particles and ITGAM as a ligand for the integrin Mac-1 and suggest that many immune-modulating effects previously ascribed to PF4 are mediated through its interaction with Mac-1 [53]. TGFB1I1 encodes a coactivator of the androgen receptor, a transcription factor which is activated by androgen and has a key role in male sexual differentiation. Hic-5 regulates GR binding site selection by a novel mechanism, exploiting gene-speci c requirements for chromatin remodeling enzymes to selectively in uence DNA occupancy and gene regulation by a transcription factor [54]. GUSB encodes a hydrolase that degrades glycosaminoglycans, including heparan sulfate, dermatan sulfate, and chondroitin-4,6-sulfate and play role in HCV-infected human liver [55]. CRCP encodes a membrane protein that functions as part of a receptor complex for a small neuropeptide that increases intracellular cAMP levels. The in uence of the CGRP family peptides in reproduction and pregnancy is explored and discussed [56]. RABGEF1 forms a complex with rabaptin-5 (RABPT5; MIM 603616) that is required for endocytic membrane fusion. Millrine D et, al. identi ed Rabex-5 as an immunomodulatory drugs (IMiDs) target molecule that functions to restrain TLR activated auto-immune promoting pathways. We propose that release of Rabex-5 from complex with Cereblon enables the suppression of immune responses, contributing to the anti-in ammatory properties of IMiDs [57]. Additionally, RABGEF1 mediates recycling endosome fusion with GAS-containing autophagosome-like vacuoles through the STX6-VAMP3-VTI1B complex; SNAREs are involved in autophagosome formation in response to bacterial infection [58]. SBDS encodes a highly conserved protein that related to primary immunode ciency and phagocyte defects. In addition, SBDS play a role in immune system process and hematopoietic or lymphoid organ development. Another candidate gene in chromosome 4, IL6, encodes a cytokine that functions in in ammation and the maturation of B cells. Besides, IL6 has been shown to have important functions in immune response, hematopoiesis, in ammation and the acute phase response. The protein is primarily produced at sites of acute and chronic in ammation, where it is secreted into the serum and induces a transcriptional in ammatory response through interleukin 6 receptor, alpha [59,60]. In the breeding perspective of dairy cattle, the relatively high heritability of immunoglobulin and available genetic information provides an opportunity to include immunoglobulin as a new trait to improve dairy cattle health. Although plentiful studies for the bene ts of IgM and to a lesser extent IgG in mice [61][62][63], the bene cial correlations reported between immunoglobulin and resistance against infectious diseases are minimal in livestock [15,64,65]. Hence, the character of immunoglobulin in immunity of dairy cattle needs further veri cation. It cannot be ignored that a number of studies are proposing self-antigens as the stimulator of B1 cells and reporting both negative and positive roles of self-reactive immunoglobulin in autoimmune diseases [62]. In mice, lupus-like autoimmune diseases developed when immunoglobulin production was impaired, but mice that could produce only IgM

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.