Identification of hub genes associated with somatic cell score in dairy cow

Somatic cell count (SCC) is used as an indicator of udder health. The log transformation of SCC is called somatic cell score (SCS). Several QTL and genes have been identified that are associated with SCS. This study aimed to identify the most important genes associated with SCS. This study compiled 168 genes that were reported to be significantly linked to SCS. Pathway analysis and network analysis were used to identify hub genes. Pathway analysis of these genes identified 73 gene ontology (GO) terms associated with SCS. These GO terms are associated with molecular function, biological processes, and cellular components, and the identified pathways are directly or indirectly linked with the immune system. In this study, a gene network was constructed, and from this network, the 17 hub genes (CD4, CXCL8, TLR4, STAT1, TLR2, CXCL9, CCR2, IGF1, LEP, SPP1, GH1, GHR, VWF, TNFSF11, IL10RA, NOD2, and PDGFRB) associated to SCS were identified. The subnetwork analysis yielded 10 clusters, with cluster 1 containing all identified hub genes (except for the VWF gene). Most hub genes and pathways identified in our study were mainly involved in inflammatory and cytokine responses. Result obtained in current study provides knowledge of the genetic basis and biological mechanisms controlling SCS. Therefore, the identified hub genes may be regarded as the main gene for the genomic selection of mastitis resistance.


Introduction
Mastitis is one of the most economically devastating diseases in dairy cows because of decreased milk production, veterinary costs, culling a cow from the herd, and milk disposal (Seegers et al. 2003).Mastitis is an inflammatory response to a bacterial infection in the udder; it is classified as clinical, sub-clinical, and chronic (Cobirka et al. 2020).Inflammation in the udder during mastitis is a response to the infection in the udder, which recruits immune cells to the mammary gland and results in an increase in the number of somatic cells in milk (Jain 1979).
Mammary gland somatic cells consist of epithelial cells and leukocytes, including neutrophils, macrophages, lymphocytes, and erythrocytes (Sharma et al. 2011).Somatic cell count (SCC) is used as an indicator of udder health, and the SCC cutoff point of 200,000 cells/ml is used to differentiate between infected and uninfected udders (Schwarz et al. 2020).The distribution of SCC is not normal; therefore, the somatic cell score (SCS) log transformation is used; SCS = log 2 (SCC/100) + 3 (Ali and Shook 1980).It has been reported that there is a highly favorable genetic correlation between clinical mastitis and SCS (Rupp and Boichard 1999;Carlen et al. 2004).Several QTL and genes have been associated with SCS (Mullen et al. 2011;Marete et al. 2018;Zeb et al. 2020;Kim et al. 2021).The genes work together, and their interactions carry out biological functions; a group of genes with similar biological functions is referred to as a 349 Page 2 of 11 gene module (Gong et al. 2007).Identified genes associated to SCS can be used to reveal the signaling pathways controlling SCS. (García-Campos et al. 2015).The identified pathways related to SCS can be used to identify highly correlated genes and construct gene networks (Wu et al. 2014).The constructed network results are used to identify hub genes (Li et al. 2018;Liu Liu et al. 2021).Hub genes are highly interconnected gene networks (Seo et al. 2009).
Most countries have been growing interest to include udder health traits to national selection indices and one of the udder health traits in dairy cow is SCS (Miglior et al. 2005).Recently genomic selection is used in the breeding program to increase genetic gains but the major limitation of genomic selection is genotyping costs because it needs to high-density SNP genotyping (Boichard et al. 2016).Selection animal based on hub genes affecting trait instead of SNP, can reduced the cost of genomic selection.Identifying pathways and hub genes associated with SCS is crucial because it can help to understand gene function and underlying mechanisms.To our knowledge, no study has identified the pathways and hub genes associated with SCS in cattle.Consequently, this study aimed to identify pathways associated with SCS and use the constructed gene network to identify hub genes affecting SCS using all genes previously reported to be associated with SCS.

Gene collection
All genes reported significantly associated (p<0.05) with somatic cell score (SCS) in dairy cattle were collected (https:// www.anima lgeno me.org/ cgi-bin/ QTLdb/ BT/ gene% 20srch?gwords accessed on 2 May 2022).These significant genes were identified through genome-wide association studies.All genes utilized in this research were annotated, and each gene was given a unique symbol.Furthermore, all genes with a significant association with SCS in cattle (identified up until 2022) were included in this study.The genes were located on 30 of the cattle's chromosomes.

GO and pathway enrichment analysis
The g: profiler was used to perform gene ontology (GO) analyses on all selected genes.The False Discovery Rate (FDR) threshold of 0.05 was applied to identify significant GO terms.Pathway enrichment analysis was performed in three types of GO terms: molecular function, cellular component, and biological process.The databases KEGG, Reactome, and WikiPathways were utilized to identify biological pathways.

Gene network construction
The following steps were used to construct gene network, identifying hub gene and subnetworks: Step 1.The g:Profiler results were uploaded into Cytoscape EnrichmentMap (Shannon et al. 2003) to create a map.
Step 2. Cytoscape StringApp (Doncheva et al. 2018) was used to construct a gene network for map created in step 1.The confidence cutoff = 0.5 was used in constructing a gene network.
Step 3. The gene network created in step 2 used to identifying hub genes.Cytoscape plugin cytoHubba (Chin et al. 2014) was used to identify hub genes with the implementation of maximal clique centrality (MCC) as a network scoring method.
Step 4. The gene network created in step 2 used to constructing subnetworks.The AutoAnnotate Cytoscape application was used to create subnetworks representing which cluster of significant genes is associated with SCS (Kucera et al. 2016).

Gene ontology and pathway analysis
From gene detection, QTL detection, and GWAS studies, 168 genes were found to be significantly (P<0.05)associated with SCS (Hu et al. 2022).From the 168 genes used for GO analysis, only 62 were enriched in SCS-related pathways (Fig. 1).GO enrichment analysis revealed that these 62 SCS-associated genes were linked with 73 GO terms (Figs. 2 and 3).
The sixteen most significant (P<0.008)GO terms identified in our research are listed in Table 1.These top GO terms are associated with molecular function, biological process, and cellular components, including endochondral ossification, axon, cytokine-cytokine receptor interaction, growth hormone signaling, regulation of the fatty acid biosynthetic process, fatty acid biosynthetic process, negative regulation of protein acetylation, response to peptide, monocarboxylic acid biosynthetic process, regulation of the fatty acid metabolic process, response to food, monocarboxylic acid metabolic process, lipid biosynthetic process, response to peptide hormone, regulation of lipid biosynthetic process and synthesis, secretion, and deacylation of ghrelin.Seven of the 16 most significant GO terms and 18 of the 73 total GO terms associated with SCS in the current study are associated with fatty acid and lipid biosynthetic processes.

Identification of hub genes and sub-networks
Our study identified 17 hub genes with a high degree of connectivity among 62 genes enriched in GO terms (Fig. 4).In Fig. 4, the color of the nodes, ranging from dark red to yellow, reflects the connectivity.The most connected genes are colored dark red, while the least connected genes are colored yellow.The 17 hub genes identified in this study, from dark red to light red in Fig. 4, are CD4, CXCL8, TLR4, STAT1, TLR2, CXCL9, CCR2, IGF1, LEP, SPP1, GH1, GHR, VWF, TNFSF11, IL10RA, NOD2, and PDGFRB, respectively.
The analysis of subnetworks yielded ten clusters (Fig. 5).The distribution of node clustering coefficients is shown in Table 2. Cluster 1 contained 29 genes, whereas the other clusters contained between two and four genes.Except for the VWF gene, all identified hub genes were included in cluster 1.None of the clusters included eight genes (CYP27A1, ME3, ACOT7, AGXT2, SLC18A2, RUFY3, PALLD, and CACHD1).These eight genes were not connected to other genes or each other in the constructed network (Fig. 4).Cluster 5 contains four genes, one of which is the hub gene (VWF).Clusters 3 and 10 were not linked to other clusters, whereas cluster 1 was linked to other clusters.

Discussion
Mastitis is one of the most economically damaging diseases in dairy cows, and it is classified as clinical, subclinical, and chronic.Mastitis is caused by different types of pathogens.The clinical type of mastitis is characterized by visible symptoms such as inflammation of the mammary gland, whereas the sub-clinical type is characterized by the absence of visible symptoms but an increase in the amount of SCC (Abebe et al. 2016).We expected that the pathways identified in our research would be directly or indirectly associated with the immune system.
SCC is a part of the innate immune system and protects the udder against infection.Leucocytes, consisting of macrophages, lymphocytes, and polymorphonuclear leukocytes, are the predominant cell type of SCC (Li et al. 2014).Our enrichment analysis demonstrates that the cytokine-cytokine receptor interaction, the cytokinemediated signaling pathway, and the cellular response to cytokine stimulation are pathways associated with SCS.Previous research indicates that most macrophages originate from bone marrow stem cells exposed to cytokines.Leukemia is another factor affecting macrophage production from bone marrow (Forte et al. 2020).
The activity of leukemia inhibitory factor receptors was identified as a GO term related to SCS in the present study.According to studies, lipid metabolism contributes to immune system regulation.Dysregulation of lipid metabolism has been linked to various diseases (Tzeng et al. 2019).In addition, fatty acids play significant roles in lymphocyte metabolic adaptation.Fatty acids regulate adaptive immunity because they are a component of cell membranes and a source of energy.
B and T lymphocytes undergo expansion, proliferation, and differentiation and require abundant nutrients that can be supplied by fatty acids (Zhou et al. 2021).In our study, 25 of 73 GO terms identified for SCS are pathways related to fatty acids and lipids, such as regulation of the fatty acid biosynthetic process, fatty acid biosynthetic process, monocarboxylic acid biosynthetic process, regulation of the fatty acid metabolic process, monocarboxylic acid metabolic process, lipid biosynthetic process and regulation of lipid biosynthetic process.
In our study, the GO terms regulation of toll-like receptor signaling pathway and toll-like receptor signaling pathway were determined to be associated with SCS.In mastitis diseases of dairy cows, the mammary gland becomes inflamed, and the somatic cell count in the milk rises due to the migration of neutrophils and macrophages (Abebe et al. 2016).Toll-like receptors are proteins that play a key role in the innate immune system by initially recognizing microbes (Kawasaki and Kawai 2014).
Malic acid is a dicarboxylic acid; its salts and esters are known as malate.Malic acid functions as a catalyst in the Krebs cycle to increase energy production (Wei et al. 2021).According to studies, malic acid increases the gene expressions of interleukin-8, interleukin-1 beta, tumor necrosis factor-alpha, and lysozyme (Safari et al. 2021).Interleukin-8 is a key mediator of the inflammatory response.In regions of inflammation, interleukin-8 attracts and activates neutrophils (Brennan and Zheng 2007).
Tumor necrosis factor alpha (TNF alpha) is an inflammatory cytokine produced by macrophages and monocytes during inflammation (Idriss and Naismith 2000).Lysozyme is an important defense mechanism and a component of the innate immune system in most mammals.Lysozyme destroys bacteria by hydrolyzing their cell walls (Ragland and Criss 2017).Ciliary neurotrophic factor receptor activity and ciliary neurotrophic factor receptor binding are additional GO terms identified in this study that are directly related to interleukin and the immune system.The ciliary neurotrophic factor belongs to the interleukin-6 cytokine family (Pasquin et al. 2015).
Our study's pathway analysis reveals that ghrelin synthesis, secretion, and deacylation are among the most significant GO terms associated with somatic cell score.Ghrelin is a hormone that the stomach produces.Gherlin's primary function in the immune system is anti-inflammatory (Baatar et al. 2011).Growth Hormone (GH) Signaling, Growth Hormone Receptor Signaling, Growth Hormone Synthesis, Secretion, Action, and Response to Growth Hormone are directly associated with the growth hormones identified in our study as being associated with SCS.One of the immune cells produced by the thymus gland is the T cell.The role of growth hormone in promoting the expansion of the thymus gland is crucial (Napolitano et al. 2008).
CD4, CXCL8, TLR4, STAT1, TLR2, CXCL9, CCR2, IGF1, LEP, SPP1, GH1, GHR, VWF, TNFSF11, IL10RA, NOD2, and PDGFRB were identified by our analysis as hub genes with a high degree of connectivity.According to Fig. 5 The subnetwork identified for somatic cell score studies, CD4 is essential in inflammatory conditions such as mastitis.CD4 single nucleotide polymorphisms have been proposed as a molecular marker for mastitis resistance selection in dairy cattle (Zeb et al. 2020).The result reported by Usman et al. (2018) confirmed that CD4 is candidate gene for both somatic cell counts and clinical mastitis in cows.CXCL8 and CXCL9 encoded an inflammatory responseimportant protein (Russo et al. 2014).During mastitis, the expression of CXCL8 increased (Günther et al. 2010).Wang et al. 2020 also reported a statistically significant association between CXCL9 and SCS.Toll-like receptor genes 2 and 6 (TLR2 and TLR4 genes) recognize bacterial lipopolysaccharide and initiate the production of pro-inflammatory cytokines and chemokines (Vaure and Liu 2014).
A highly significant association between TLR4 and SCS single nucleotide polymorphisms and clinical mastitis has been documented (Mesquita et al. 2012;Elmaghraby et al. 2018).STAT1 is a crucial component of innate immune systems.Mammalian cells contain seven versions of these genes, and there is a significant correlation between mastitis and STAT gene members (Khan et al. 2020).CCR2 is another hub gene identified by our research.CCR2 encodes a protein that functions as the main chemokine receptor in monocyte recruitment to sites of inflammation (Bakos et al. 2017).
IGF1 is regarded as a hub gene among the genes used in our study.T-cell activation is the function of IGF1 in the immune system (Smith 2010), and a significant association between single nucleotide polymorphisms in IGF1 and SCS has been reported (Mullen et al. 2011).The LEP gene plays a role in T cell proliferation, humoral immune response, and inflammatory response in the immune system (Li et al. 2021).T cells produce osteopontin, a multifunctional cytokine encoded by the SPP1 gene (Zheng et al. 2021).A significant association between SPP1 polymorphisms and mastitis has been established (Alain et al. 2009).
Two SCS hub genes, GH1 and GHR, were associated with growth hormones.Important immune system functions of growth hormone include stimulating the proliferation of T and B cells, immunoglobulin synthesis, and regulating cytokine response (Meazza et al. 2004).There is an association between growth hormone single nucleotide polymorphisms and SCS (Mullen et al. 2010).
VWF, a hub gene related to SCS, was clustered with a cluster of other hub genes identified in the current study during subnetwork analysis.VWF-encoded protein functions in coagulation.Complement (also known as complement cascade) is a component of the immune system; in this system, phagocytic cells remove microbes and damaged cells from an organism.Complement and coagulation have a relationship (Donat et al. 2019).A significant association exists between SCS and an SNP on VWF (Kim et al. 2021).TNFSF11, one of the hub genes identified in our study, is a gene whose products are expressed in T cells to stimulate dendritic cells, and the SNP of this gene is associated with SCS (Marete et al. 2018).Our study identified the IL10RA gene as the hub gene for SCS.This gene mediates the immunosuppressive signal of the leukocyte-produced interleukin-10 (Shouval et al. 2014).Verschoor et al. (2009) observed a correlation between IL10RA SNPs and SCS.NOD2 is related to innate immunity and inflammation and is reported to be a candidate gene for mastitis in cows (Moretti et al. 2021).The final hub gene identified in our research was PDGFRB, linked to SCS and sixteen other traits in cattle (Cole et al. 2011).
In our study, all genes associated with SCS formed 10 clusters.As is well-established, these genes in each cluster are involved in the same pathway and may have co-expression gene expression patterns.Cluster 1 contains most of the genes and hub genes, indicating that the same co-expression genes regulate the most important SCS-related pathways.Inflammatory and cytokine responses were the main functions of numerous hub genes and pathways identified in our study.Consequently, the identified hub genes may be regarded as the main gene in genomic selection for SCS.
In this study, we utilized genes previously linked to SCS.To this end, we identified significant pathways associated with SCS.Using constructed gene networks, 17 hub genes related to SCS were identified (CD4, CXCL8, TLR4, STAT1, TLR2, CXCL9, CCR2, IGF1, LEP, SPP1, GH1, GHR, VWF, TNFSF11, IL10RA, NOD2, and PDG-FRB).Our research uncovered the underlying mechanism and significant genes affecting SCS, which may facilitate genomic selection for SCS.In sire summaries of dairy cow SCS provides information to select mastitis resistant.In recent year the genomic selection is useful tool for accuracy of predictions in dairy cow.The finding of our study provides knowledge of the genetic basis and biological mechanisms controlling SCS.Therefore these significant hub genes reported in our study can be used in genomic selection programs to select cows to mastitis resistance.

Conclusion
In this study we used reported genes associated to SCS.We identified significant pathways related to SCS.Using constructed gene networks identified 17 hub genes (CD4, CXCL8, TLR4, STAT1, TLR2, CXCL9, CCR2, IGF1, LEP, SPP1, GH1, GHR, VWF, TNFSF11, IL10RA, NOD2 and PDGFRB) related to SCS.The result of our study unveiled the underlying mechanism and important genes affecting SCS that may provide facilities for genomic selection for SCS.

Fig. 1
Fig. 1 Venn-diagram show the number of gene-set terms are enriched in across GO, KEGG, WikiPathways and Reactome databases

Fig. 2
Fig. 2 Gene ontology analysis of genes associated with somatic cell score

Table 1
Most significant GO terms associated to somatic cell score