Meta-analysis of genome-wide association studies for litter size in sheep

DOI: https://doi.org/10.21203/rs.3.rs-770225/v1

Abstract

Background

Litter size and ovulation rate are important reproduction traits in sheep and have important impacts on the profitability of farm animals. To investigate the genetic architecture of litter size, we report the first meta-analysis of genome-wide association studies (GWAS) using 522 ewes and 564,377 SNPs from six sheep breeds.

Results

We identified 29 significant associations for litter size which 27 of which have been not reported in GWASs for each population. However, we could confirm the role of BMPR1B in prolificacy. Our gene set analysis discovered biological pathways related to cell signaling, communication, and adhesion. Functional clustering and enrichment using protein databases identified epidermal growth factor-like domain affecting litter size. Through analyzing protein-protein interaction data, we could identify hub genes like CASK, PLCB4, RPTOR, GRIA2, and PLCB1 that were enriched in most of the significant pathways. These genes have a role in cell proliferation, cell adhesion, cell growth and survival, and autophagy.

Conclusions

Notably, identified SNPs were scattered on several different chromosomes implying different genetic mechanisms underlying variation of prolificacy in each breed. Given the different layers that make up the follicles and the need for communication and transfer of hormones and nutrients through these layers to the oocyte, the significance of pathways related to cell signaling and communication seems logical. Our results provide genetic insights into the litter size variation in different sheep breeds.

Introduction

Reproduction is one of the most important economic traits in sheep. Identification of candidate genes and their causal mutations is an effective tool to understand the genetic mechanism underlying the variation in reproductive performance in sheep (1). Fertility, fecundity, and prolificacy are all indicators of reproductive performance in sheep. Fertility is considered as the number of lambing per year, fecundity is measured as the number of lambs produced each year, and prolificacy is estimated as the size of the litter (2). So far, a variety of fecundity genes have been discovered in sheep. The first underlying mutation linked to increased prolificacy in sheep was identified in the Bone Morphogenetic Protein Receptor 1B (BMPR1B) gene (35). Also, other variants associated with increased prolificacy in sheep have been reported in BMP15 (FecX) (6), GDF9 (FecG) (7), and, B4GALNT2 (FecL) (8) genes. Since these identified mutations could be breed-specific and even population-specific, further study of the genome in different breeds/populations is recommended to identify novel causal variations associated with prolificacy in sheep.

The advent of the high-throughput genotyping technologies that facilitate the genotyping of thousands of single-nucleotide polymorphisms (SNPs) has made it possible to utilize the Genome-wide association studies in identifying molecular markers associated with quantitative traits which provide new opportunities for marker-assisted selection and genomic selection. This could allow breeding programs to become more efficient and result in much more increases in response to selection (9, 10).

Several GWAS for reproduction traits in sheep have been conducted in recent years and successfully reported genetic markers. Demars et al. (2013) identified a region close to the BMP15 gene on chromosome x and reported two non-conservative mutations called FecXGR and FecXO associated with litter size and ovulation rate in the Grivette and Olkuska breeds, respectively. Gholizadeh et al. (2014) identified Significant SNPs on chromosomes 10 and 15 (within the DYNC2H1 gene) associated with litter size across the four parities in Baluchi sheep. Benavides and Souza. (2018) performed a GWAS in Vacaria sheep population in which, an associated causal SNP within the GDF9 gene (Vacaria mutation) segregates. Calvo et al., (2020) identified the FecXGR allele associated with prolificacy in 158 Rasa Aragonesa ewes. They also identified a novel polymorphism in exon 2 of BMP15 using the PCR-sequencing approach. Smołucha et al. (2021) reported a genome-wide significant SNP located within the EPHA6 gene associated with litter size in Polish Mountain sheep. Esmaeili-Fard et al. (2021) reported NTRK2 as a novel candidate gene associated with litter size in Baluchi sheep. Xia et al., (2020) also identified both 105,603,287 lncRNA and its target gene, NTRK2, as significantly upregulated genes associated with prolificacy in the Han sheep.

When it comes to exploring candidate genes and discovering new mutations underlying variation in the trait of interest, optimal power to correctly reject the null hypothesis is crucial. Combining datasets could result in more power. Meta-analysis is an approach that makes it possible to combine data from various studies. These techniques also provide an opportunity for a quantitative assessment of the findings’ accuracy or inconsistency/heterogeneity through multiple studies (17). Also, the protein-protein interaction (PPI) networks are of special interest because the genes usually act as a group to achieve a collective goal. The networks may correlate to specific functions. PPI networks provide valuable information in the understanding of the cellular function and biological processes.

In the previous study, (1) through six independent GWAS in high and low prolific sheep breeds (six breeds), identified multiple candidate genes (BMPR1B, FBN1, MMP2, GRIA2, SMAD1, CTNNB1, NCOA1, INHBB, NF1, FLT1, PTGS2, PLCB3, ESR2, ESR1, GHR, ETS1, MMP15, FLI1, and SPP1) associated with litter size. Our study aimed to investigate genetic mechanisms underlying litter size in sheep using the meta-analysis approach. Meta-analysis of these six independent GWAS could improve the power to identify more variants associated with litter size. We also conducted gene set analysis based on the results of meta-analysis as a complementary approach to investigate litter size from a biological perspective. Also, With the tremendous increase in protein interaction data, we applied network-based analyses to understand molecular mechanisms of litter size.

Materials And Methods

GWAS and Meta-Analysis

We considered genotypes and phenotypes of 522 ewes from five sheep breeds of high (Wadi, n = 160; Hu, n = 117; Icelandic, n = 54; Finnsheep, n = 54; and Romanov, n = 78) and one low (Texel, n = 59) prolificacy. The GWA analysis of these breeds for prolificacy has already been published by (1) and details about sample collection, phenotyping, and genotyping can be found in that paper. All the sheep breeds were genotyped for almost 600k positions using the Ovine Infinium HD BeadChip (Illumina, San Diego, CA, United States). First, conducting six separate GWAS, we have reproduced all GWAS results for each breed. Quality control of genotyping was carried out using the GenABEL (18) R package. SNPs with call rates < 95%, minor allele frequency (MAF) < 0.05, and p-value of Fisher’s exact test for Hardy–Weinberg equilibrium < 10-5 excluded from the analysis. Also, samples with call rates ≤ 95% were removed from datasets.

To remove the parity effect, the mean litter size of each individual (total litter size/parity) was used as the response variable. Based on the distribution of phenotypes, ewes in each breed, coded as high- and low- prolific (for details see (1)). To account for the within-population substructure, we implemented a principal component analysis (PCA) using a set of LD-pruned SNPs and two first PC were incorporated in the model. Plink1.9 (19) and GenABEL (18) were used, respectively. We ran the following logistic model to evaluate the association between SNPs and litter size in each breed: see formula 1 in the supplementary files section.

where, y is the vector of case/control value (1 and 2) for each ewe; XSNP is the incidence matrix relating values in y vector to SNP genotypes, and βSNP is the regression coefficient. The analysis was performed using the GenABEL (18) package. The p-values were almost non-inflated (0.87 ≤ λ ≤ 1.14) for all traits and partial inflation was corrected using the genomic control (GC) method.

A single-trait meta-analyses were undertaken across GWAS results from six sheep breeds using the fixed effects inverse variance weighted method implemented in the METAL (20) software. This meta-analytical approach works based on the GWAS SNPs summary statistics - beta and its standard errors - and is considered the most powerful method to identify novel loci (21). For multiple testing corrections, based on the p-value cutoff of 0.05 and the total number of pruned SNPs (independent tests) we determined the genome-wide threshold (31,752/0.05=1.57e-06). Also, based on the average number of SNPs on each chromosome and the same P-value cutoff, we determined a chromosome-wise (suggestive) threshold (0.05/1176=4.25e-05). The CMplot (https://github.com/YinLiLin/R-CMplot) R package was used for drawing the Manhattan plot.

Functional Annotation

We explored the functional significance of the top SNPs from the meta-analysis using the GTF file from Oar_v.4.0 ovine reference genome assembly. The search was performed using a 15 k base pair window around the top SNPs. GeneCards (https://www.genecards.org/, last access: 20 April 2021) and National Center for Biotechnology Information Gene (http://www.ncbi.nlm.nih.gov/gene/, last access: 20 April 2021) were used to reveal gene functions. The sheep QTL database (http://www.animalgenome.org/cgi-bin/QTLdb/OA/index, last access: 20 April 2021) was used to find whether the identified SNPs located in known prolificacy-related QTLs.

Gene-set enrichment and functional clustering analysis

For enrichment analysis, we need to construct a vector contains significant gene names. As the trait has polygenic nature, several poorly associated SNPs fall below the significant line and the large majority of the genetic variants contributing to the trait remain hidden. So, to capture the cumulative effects of poorly associated variants, we used an arbitrary threshold (0.001) to filter SNPs for constructing gene vector. Using an in-home R script and Oar_v.4.0 ovine reference genome assembly, we wrapped SNPs around the genes and assigned SNPs to genes if they were located within the genes or 15 kb upstream or downstream of a gene (list of significant genes in gene vector are provided in Supplementary File 1, sheet 5).

To extract biological views from the resulted gene vector, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases were used to define the functional gene-sets. KEGG enrichment analysis was performed using the R package clusterProfiler (22), but due to the low gene id conversion rate (to ensemble id), we used g:Profiler (23), a web-based enrichment tool, for GO enrichment which led to better conversion rate and results. Terms and pathways with FDR and q-value less than 0.05 (GO and KEGG, respectively) were reported as significant sets. The ggplot2 (24) R package was used to visualize enrichment results as bar plots. For enrichment using protein databases, Cytoscape 3.8.2 (25) along with StringApp 1.6.0 plug-in (26) were used.

Significant genes (official gene symbol) were loaded in DAVID (27,28) for functional clustering analysis against the ovine genes as background. This tool classifies genes in functionally related groups based on gene ontology, protein data, and pathway data, and calculates enrichment scores for groups of genes with a related biological function. Analysis was performed using the DAVID default setting. The functional cluster with the greatest enrichment score was reported.

Protein-protein interactive (PPI) network construction and module identification

To investigate the interactions of proteins encoded by the identified significant genes, the StringApp 1.6.0 (26) plugin in Cytoscape 3.8.2 (25) was used to build the PPI network using the STRING 11.0 database (29). The MCODE 2.0.0 (30) plugin, another Cytoscape App, was used to identify the very closely associated regions within the network which are known as the core modules of the PPI. MCODE identifies the highly interacting nodes (clusters) present in the network which has high quality based on topology, function and, localization similarity of proteins within predicted complexes. Degree cut-off 4 along with the 2 k‐core were used as thresholds. Each node gets a score and clusters will be ranked using scoring by the density and size of the cluster. Clusters with a higher score have more interaction between genes. To identify key genes in the constructed network, another Cytoscape App, cytoHubba 0.1 (31) was used to rank the individual nodes and finding hub genes. The hub genes were identified based on degree parameters, which represent the total number of nodes connected to its adjacent nodes. It also provides the count of the number of interactions of a given node. The top ten hup genes have been reported.

Results

Single-trait meta-analysis of GWAS results

We conducted, a single-trait meta-analysis of GWAS for prolificacy across six sheep breeds. Here, 29 SNPs associated with prolificacy were identified. Among them, only two SNPs reached the genome-wide significance level (Fig. 1). The QQ plot did not show any inflation of the test statistics (λgc = 1.027, Fig. 2).

We identified known and novel QTL across six sheep breeds. The summary of significant SNPs in the meta-analysis is presented in Table 1. Consistent with previous reports, OAR3_OAR6_29362224 (rs416717560), an SNP within the bone morphogenetic protein (BMP) receptor-1B (BMPR1B/FecB), was associated with litter size at the genome-wide significance level. Also, meta-analysis identified 27 suggestive SNPs that some of them have been already reported in other GWASs for litter size. The identified SNPs were scattered on various chromosomes implying different genetic mechanisms underlying variation of prolificacy in each breed.

Table 1

Genome- and chromosome-wide significant GWAS meta-analysis results.

SNP

Chr

Position (bp)

Major/minor

allele

Genes (15kb)

Adjusted p-value (GC)

OAR3_OAR14_23199877

14

23133427

G/A

LOC105608122

5.55E-07

OAR3_OAR6_29362224

6

29295803

G/A

BMPR1B *

1.04E-06

OAR3_OAR2_47930582

2

47968324

G/A

LOC105607546 *

2.06E-06

OAR1_274928289.1

1

254159304

A/G

TMEM108 *

3.71E-06

OAR3_OAR3_177468317

3

177241585

A/G

LOC105614803

4.03E-06

OAR3_OAR2_132705930

2

132717299

A/G

MTX2 *

4.82E-06

OAR3_OAR1_197796181

1

197637524

G/A

LOC105602706

5.46E-06

OAR3_OAR22_3829898

22

3811473

C/A

PCDH15 *

7.59E-06

OAR3_OAR19_59955075

19

59939988

G/C

KLF15 *

8.09E-06

OAR3_OAR12_42069297

12

41992138

C/A

SLC25A33 *

8.13E-06

OAR3_OAR12_42070709

12

41993550

G/A

SLC25A33 *

8.13E-06

OAR3_OAR3_9708115

3

9652824

G/A

MVB12B *

9.59E-06

OAR3_OAR1_10606057

1

10587862

A/G

AGO1 *

9.97E-06

OAR3_OAR21_31104689

21

31046572

G/A

ETS1 *

1.06E-05

OAR3_OAR2_75981497

2

76028277

G/A

PTPRD *

1.22E-05

OAR3_OAR2_132690893

2

132702262

C/A

MTX2 *

1.26E-05

OAR3_OAR10_2844118

10

2830516

G/A

DIAPH3 *

1.46E-05

OAR3_OAR2_140444547

2

140460303

A/G

B3GALT1 *

1.77E-05

S65625.1

2

140457860

A/G

B3GALT1 *

1.77E-05

OAR3_OAR6_114562470

6

114447690

G/A

ADRA2C

1.84E-05

OAR3_OAR13_61618860

13

61522307

A/G

SUN5, BPIFB2 *

2.56E-05

OAR3_OAR2_61245281

2

61286687

G/A

NMRK1, LOC106990898

2.91E-05

OAR3_OAR3_177454989

3

177228288

G/A

LOC105614803

2.99E-05

OAR3_OAR18_43948718

18

43894548

G/A

SNX6, LOC101103761

3.11E-05

OAR3_OAR1_233450483

1

233307971

A/C

MBNL1

3.49E-05

OAR3_OAR5_77138988

5

77064581

G/A

LOC101120408 *

3.81E-05

OAR3_OAR7_1507850

7

1495859

G/A

EPB41L4A *

3.83E-05

OAR3_OAR16_15290229

16

15296602

G/A

RNF180 *

3.84E-05

OAR3_OAR14_9381054

14

9355176

A/G

CDH13 *

3.89E-05

SNPs with boldface are significantly associated with genes but otherwise are suggestive SNPs. Star (*) shows that identified SNP is located within the gene.

Gene-set enrichment and functional clustering analysis

The result of the meta-analysis was complemented with gene set enrichment analysis using the GO, KEGG, and some protein databases. We also performed a functional clustering analysis to see how identified terms and pathways cluster. In total, from 564,377 SNP tested in the meta-analysis, 342,340 SNP were located within or 15 kb upstream or downstream of 28,168 genes in the Oar_v.4.0 ovine reference genome assembly. About 431 genes, out of 28,168 genes contained at least one significant SNP (P-value ≤ 0.001) and defined as significantly associated genes with the litter size. GO terms and KEGG pathways with an adjusted P-value ≤ 0.05 were reported as significant (Fig. 2).

Several GO terms and KEGG pathway related to signaling showed an overrepresentation of significant genes, including, signaling (GO:0023052), signal transduction (GO:0007165), trans-synaptic signaling (GO:0099537), Glucagon signaling pathway (oas04922), Calcium signaling pathway (oas04020), Phospholipase D signaling pathway (oas04072), PI3K-Akt signaling pathway (oas04151), AMPK signaling pathway (oas04152), and Estrogen signaling pathway (oas04915). In addition, terms and pathways related to cell communication, cell junction, cell adhesion, and synapse were among the significant categories.

Enrichment analysis using protein databases showed that several genes in our gene vector, significantly enriched in categories related to epidermal growth factor-like domain (EGF-like domain) (Table 2). It means that enriched genes in identified protein categories contain at least one EGF-like domain in a more or less conserved form, a sequence in which about thirty to forty amino-acid residues long is found in the sequence of epidermal growth factor. The effect of EGF in follicular development and prolificacy has been stressed (32). To identify clusters of functionally related molecular functions, the DAVID functional annotation clustering tool was used. Clusters were ranked according to the enrichment score for each cluster reflecting the geometric mean of all the enrichment p-values (EASE scores) of each annotation term in the cluster. Here, we reported the first cluster with an enrichment score of 1.77 (Table 3).

Table 2

Enrichment results using protein databases.

Category *

Term name

Description

Count

P-value (FDR)

SMART Domains

SM00181

Epidermal growth factor-like domain.

13

0.0033

UniProt Keywords

KW-0245

EGF-like domain

13

0.0043

InterPro Domains

IPR000742

EGF-like domain

14

0.0116

InterPro Domains

IPR013032

EGF-like, conserved site

11

0.0326

Pfam

PF00008

EGF-like domain

7

0.04

* Protein databases provide functional analysis of proteins by classifying them into families and predicting domains and important sites.
This cluster contains seven categories from protein databases that four of them related to the EGF and EGF-like domain and three of them were associated with the laminin globular (G) domain (Lam G). The Laminin G is about 180 amino acid long domain found in a large and diverse set of extracellular proteins. Enrichment results using all databases are provided in Supplementary File 1, sheet 1 to sheet 4.

Table 3

Functional annotation clustering result. First top annotated cluster is presented (enrichment score: 1.77).

Category

Term name

Description

Count

P-value

SMART Domains

SM00282

LamG

6

0.0008

SMART Domains

SM00181

EGF

11

0.001

InterPro Domains

IPR001791

Laminin G domain

6

0.001

InterPro Domains

IPR013032

EGF-like, conserved site

10

0.002

InterPro Domains

IPR000742

Epidermal growth factor-like domain

11

0.002

InterPro Domains

IPR013320

Concanavalin A-like lectin/glucanase, subgroup

9

0.019

UniProt Keywords

KW-0245

EGF-like domain

6

0.045

PPI network construction and module identification

Using the SringApp plugin, protein-protein interaction data of 431 significant genes were extracted from the STRING v 1.6.0 database and visualized using Cytoscape. After removing genes (proteins) with no PPI information, 186 nodes and 254 edges remained in the network. The MCODE plugin was applied to discover the hub clusters in the network. The most significant module was shown in Figs. 3A and B within the red dashed circles. It comprises 5 nodes and 10 edges. Hub genes were selected using the cytoHubba plugin. Figure 3 shows the networks constructed by 10 (Fig. 3A) and 15 hub genes (Fig. 3B). Figure 3A also shows the first-stage nodes. Top ten hub genes in order were CASK, PLCB4, RPTOR, GRIA2, PLCB1, GRID1, PPP3CC, PPP3CA, TMEM201, and PIK3C2G. Identified hub genes were submitted for the pathway analysis but due to the same results compared with the full gene list, we didn’t present the results.

Discussion

In this study, meta-analysis was used to combine the results of the six independent GWAS studies on litter size. In recent years, several studies have been conducted to identify causal variants underlying the genetic of economically important traits in livestock. Although genotyping costs have notably been reduced, GWA studies in large scale size are still costly (33). The use of small sample sizes may not provide enough power to identify variants with small effect sizes (34), and these variants require large sample sizes, especially when their allelic frequency is low in the population. In addition, the findings of small-scale genomic studies might not be confirmed in larger studies, which is more common in candidate gene studies (35). Meta-analysis of multiple GWA studies as an appropriate achievement has opened ups possibly to reach enough sample size and sufficient power to identify variants with small effect sizes.

In this study, through meta-analysis of six GWAS results, we identified two genome-wide significant SNPs, OAR14_23199877 and OAR6_29362224 on chromosomes 14 and 6, respectively.

Functional annotation revealed that OAR6_29362224 is located within the BMPR1B (FecB) gene. This gene has previously been reported by Xu et al. (2018) for Hu sheep as candidate genes associated with litter size. The BMPR1B encodes a protein that is a member of the intracellular serine/threonine kinase signaling domain. The ligands of this receptor are BMPs, which are members of the TGF-beta superfamily. Booroola (FecB) ewes have an increased ovulation rate and an average litter size of one to two extra lambs per copy of the FecB gene (5). The BMPR1B gene can lead to an increased density of the follicle-stimulating hormone (FSH) and luteinizing hormone (LH) receptors with a concurrent reduction in apoptosis to increase the ovulation rate of ewes (36, 37).

In addition, the results of the meta-analysis identified 27 other variants that were significantly associated with litter size at the chromosome level. Except for the ETS1 gene, other candidate genes reported by Xu et al. (2018) were not identified in our meta-analysis on this data set. It has been reported that the ETS1 gene affects ovulation in bovine by its link to the regulator of protein signaling protein-2 (RGS2) (38). Also, further investigation revealed that many of these SNPs are located within known ovine genes that are associated with reproduction.

Gene-set enrichment analyses using the GO, KEGG, and some protein databases were applied to complement meta-analysis. GWAS is an efficient approach to identify variants that their effects are large enough. However, complex traits are controlled by a large number of variants with small effects, which make it difficult to identify these variants in this way, even with large sample sizes (39). There are also causal variants located in non-coding regions that make it difficult to understand the biological mechanisms that control traits. Gene set enrichment analysis provides an opportunity to address both of these problems (40). In this study, both GO and KEGG analyses led to the identification of several terms and signaling pathways related to reproductive functions.

Both GO and KEGG enrichment identified several terms and pathways related to signaling and cell communication processes (Fig. 2). Given the different layers that make up the follicles and the need for communication and transfer material through these layers to the oocyte, the significance of these pathways seems logical. Identified pathways are frequently reported in previous studies (15, 16, 41, 42). PI3K-Akt signaling pathway has been identified in multiple publications associated with reproduction (15, 42). Various normal cellular processes, such as cell proliferation, survival, motility, growth, cytoskeletal rearrangement, and metabolism, through multiple downstream targets, are regulated by the PI3K/PTEN/Akt pathway (43). Abnormalities in the PI3K-Akt signaling pathway may associate with some status of female infertility attributed to primordial follicle depletion, including premature ovarian failure and primary amenorrhea (44). Studies identified the Calcium signaling pathway as a significant KEGG term associated with prolificacy (15, 42). Also, (1) reported calcium ion binding GO term as pathway affecting prolificacy in Chinese sheep. Studies have demonstrated that calcium signaling has a key role in the oocyte’s meiotic maturation (45, 46). It has been reported that the absence of glucagon signaling can result in a poor uterine environment for fetal growth. The lack of glucagon signaling during pregnancy was related to a decreased litter size, increased neonatal mortality, and severe intrauterine growth restriction concluding that glucagon signaling has a key function during mating and pregnancy maintenance (47). In addition, Ouhilal et al. (2012) reported that lack of glucagon signaling during pregnancy alters the maternal metabolic milieu that can result in intrauterine growth restriction and fetal demise. The Phospholipase D signaling pathway was also reported by Esmaeili-Fard et al. (2021) as a significant term associated with litter size in Baluchi sheep. It is documented that AMPK is involved in the regulation of ovarian functioning in females (49). Some studies show that AMPK regulates the steroidogenesis of granulosa cells and oocyte maturation (5052). Leptin, adiponectin, ghrelin, and resistin as some metabolic hormones may to some extent act through the AMPK signaling. These hormones also have roles in the regulation of reproductive functions in both males and females through the hypothalamus-pituitary-gonadal axis (49). Estrogens as ovarian sex hormones are main regulators of the female reproductive functions and responsible for cellular proliferation and growth of tissues involved in reproduction (53). Also, several candidate genes for reproduction such as BMPR1B, identified in the present study, are related to the estrogen signaling pathway (54).

In this study, functional annotation clustering led to a cluster containing seven categories among which four of them related to the EGF and EGF-like domain and three of them associated with the laminin globular (G) domain (Lam G). Shimada et al. (2016) identified the epidermal growth factor (EGF)-like factor that affects EGF receptor and then induces the ERK1/2 and Ca2+‐PLC pathways in cumulus cells. Down‐regulation of these two pathways leads to the notable suppression of induction of cumulus expansion and oocyte maturation implying that both pathways are inducers of the ovulation process.

In the present study, protein-protein interaction data were extracted and used to discover the hub genes and module identification. The first three identified hub genes include CASK, PLCB4, and RPTOR. The CASK gene encodes a calcium/calmodulin-dependent serine protein kinase that is a member of the membrane-associated guanylate kinase (MAGUK) protein family. MAGUKs are scaffolding proteins associated with intercellular junctions (56). It may mediate a link between the extracellular matrix and the actin cytoskeleton via its interaction with syndecan and with the actin/spectrin-binding protein. It has been shown that CASK regulates the proliferation and adhesion of epidermal keratinocytes (57). Oliver et al. (2019) in a GWA analysis identified PLCB4 as a master regulator associated with spontaneous abortion in cattle. PLCB4 belongs to the phospholipase C-beta family and is associated with the same family as the leading-edge, PLCB1. As a master regulator, PLCB4 regulates PLCB1 and has a function in calcium signaling (59), which is essential for appropriate uterine quiescence and smooth muscle contractions during gestation (58). In addition, PLCB4 plays a role in the embryonic development of internal and external facial structures (60), with disruptions in development possibly driving to fetal death if development is compromised (58). Notably, PLCB4 was enriched in most of the identified pathways related to signaling. Another identified hub gene, RPTOR (Regulatory Associated Protein of MTOR Complex 1), encodes part of a signaling pathway regulating cell growth and survival, and autophagy in response to nutrient and hormonal signals (insulin levels). This gene encodes an evolutionarily conserved protein with multiple roles in the mTOR pathway (61).

Conclusions

In conclusion, we performed the first GWAS meta-analysis of litter size in six sheep populations. We identified BMPR1B, a gene with confirmed roles in prolificacy. We also identified 26 loci that were not identified in the independent studies. Most of the significant genes were enriched by biological pathways related to cell signaling, communication, and adhesion. Given the different layers that make up the follicles and the need for communication and transfer material through these layers to the oocyte, the significance of these pathways seems logical. Enrichment using protein databases and functional clustering identified the Epidermal growth factor-like domain in association with litter size. Also, our network analysis, identified hub genes like CASK, PLCB4, and RPTOR that were enriched in most of the significant pathways. These results can provide further knowledge and a better understanding of genetic variants and genes underlying litter size, a complex trait in sheep.

Declarations

Acknowledgments

Not applicable.

Supplementary material

Supplementary File1 (sheet 1). KEGG pathways significantly (P ≤ 0.05) enriched using genes associated with litter size.

Supplementary File1 (sheet 2). GO terms. GO terms significantly (P ≤ 0.05) enriched using genes associated with litter size in Baluchi sheep.

Supplementary File1 (sheet 3). Protein categories were significantly (P ≤ 0.05) enriched using genes associated with litter size.

Supplementary File1 (sheet 4). Functional clustering result.

Supplementary File1 (sheet 5). Identified significant genes used for enrichment and network analyses.

Availability of data and materials

The datasets analyzed during the current study are available on https://www.animalgenome.org/repository/pub/CAAS2018.0302/.

Competing Interest

The authors declare that they have no competing interests

Financial support statement

This research received no specific grant from any funding agency, commercial or not-for-profit section

Ethics approval

Not applicable.

Consent for publication

Not applicable.

Author Contributions

Mohsen. Gholizadeh: Project administration, Conceptualization, Resources, Writing – original draft, Writing – review & editing.

Seyed Mehdi Esmaeili-Fard: Conceptualization, Methodology, Formal analysis, Software, Visualization, Writing – original draft, Writing – review & editing. All authors read and approved the final manuscript.

References

  1. Xu S-S, Gao L, Xie X-L, Ren Y-L, Shen Z-Q, Wang F, et al. Genome-wide association analyses highlight the potential for different genetic mechanisms for litter size among sheep breeds. Front Genet. 2018;9:118.
  2. Smołucha G, Gurgul A, Jasielczuk I, Kawęcka A, Miksza-Cybulska A. A genome-wide association study for prolificacy in three Polish sheep breeds. J Appl Genet. 2021;62(2):323–6.
  3. Souza CJH, MacDougall C, Campbell BK, McNeilly AS, Baird DT. The Booroola (FecB) phenotype is associated with a mutation in the bone morphogenetic receptor type 1 B (BMPR1B) gene. J Endocrinol. 2001;169(2):R1.
  4. Mulsant P, Lecerf F, Fabre S, Schibler L, Monget P, Lanneluc I, et al. Mutation in bone morphogenetic protein receptor-IB is associated with increased ovulation rate in Booroola Merino ewes. Proc Natl Acad Sci. 2001;98(9):5104–9.
  5. Wilson T, Wu X-Y, Juengel JL, Ross IK, Lumsden JM, Lord EA, et al. Highly prolific Booroola sheep have a mutation in the intracellular kinase domain of bone morphogenetic protein IB receptor (ALK-6) that is expressed in both oocytes and granulosa cells. Biol Reprod. 2001;64(4):1225–35.
  6. Galloway SM, McNatty KP, Cambridge LM, Laitinen MPE, Juengel JL, Jokiranta TS, et al. Mutations in an oocyte-derived growth factor gene (BMP15) cause increased ovulation rate and infertility in a dosage-sensitive manner. Nat Genet. 2000;25(3):279.
  7. Hanrahan JP, Gregan SM, Mulsant P, Mullen M, Davis GH, Powell R, et al. Mutations in the genes for oocyte-derived growth factors GDF9 and BMP15 are associated with both increased ovulation rate and sterility in Cambridge and Belclare sheep (Ovis aries). Biol Reprod. 2004;70(4):900–9.
  8. Drouilhet L, Mansanet C, Sarry J, Tabet K, Bardou P, Woloszyn F, et al. The highly prolific phenotype of Lacaune sheep is associated with an ectopic expression of the B4GALNT2 gene within the ovary. PLoS Genet. 2013;9(9):e1003809.
  9. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29.
  10. Goddard ME, Hayes BJ, Meuwissen THE. Genomic selection in livestock populations. Genet Res (Camb). 2010;92(5–6):413–21.
  11. Demars J, Fabre S, Sarry J, Rossetti R, Gilbert H, Persani L, et al. Genome-wide association studies identify two novel BMP15 mutations responsible for an atypical hyperprolificacy phenotype in sheep. PLoS Genet. 2013;9(4):e1003482.
  12. Gholizadeh M, Rahimi-Mianji G, Nejati-Javaremi A, De Koning DJ, Jonas E. Genomewide association study to detect QTL for twinning rate in Baluchi sheep. J Genet. 2014 Aug;93(2):489–93.
  13. Benavides M V, Souza H. How efficiently Genome-Wide Association Studies (GWAS) identify prolificity-determining genes in sheep. Genet Mol Res. 2018;17(2).
  14. Calvo JH, Chantepie L, Serrano M, Sarto MP, Iguacel LP, Jiménez MÁ, et al. A new allele in the BMP15 gene (FecXRA) that affects prolificacy co-segregates with FecXR and FecXGR in Rasa aragonesa sheep. Theriogenology. 2020;144:107–11.
  15. Esmaeili-Fard SM, Gholizadeh M, Hafezian SH, Abdollahi-Arpanahi R. Genome-wide association study and pathway analysis identify NTRK2 as a novel candidate gene for litter size in sheep. PLoS One. 2021;16(1):e0244408.
  16. Xia Q, Li Q, Gan S, Guo X, Zhang X, Zhang J, et al. Exploring the roles of fecundity-related long non-coding RNAs and mRNAs in the adrenal glands of small-tailed Han Sheep. BMC Genet. 2020;21:1–11.
  17. Zeggini E, Ioannidis JPA. Meta-analysis in genome-wide association studies. 2009;
  18. Aulchenko YS, Ripke S, Isaacs A, Van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007;23(10):1294–6.
  19. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
  20. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1.
  21. Pereira T V, Patsopoulos NA, Salanti G, Ioannidis JPA. Discovery properties of genome-wide association signals from cumulatively combined data sets. Am J Epidemiol. 2009;170(10):1197–206.
  22. Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. Omi a J Integr Biol. 2012;16(5):284–7.
  23. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g: Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 2019;47(W1):W191–8.
  24. Wickham H. ggplot2: elegant graphics for data analysis. Springer; 2016.
  25. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504.
  26. Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: network analysis and visualization of proteomics data. J Proteome Res. 2018;18(2):623–32.
  27. Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13.
  28. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44.
  29. Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019;47(D1):D607–13.
  30. Bader GD, Hogue CW V. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics [Internet]. 2003/01/13. 2003 Jan 13;4:2. Available from: https://pubmed.ncbi.nlm.nih.gov/12525261
  31. Chin C-H, Chen S-H, Wu H-H, Ho C-W, Ko M-T, Lin C-Y. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8(4):1–7.
  32. Rajkovic A, Pangas SA, Matzuk MM. Follicular development: mouse, sheep, and human models. Knobil Neill’s Physiol Reprod. 2006;1:383–424.
  33. Sargolzaei M, Schenkel FS. QMSim: a large-scale genome simulator for livestock. Bioinformatics. 2009;25(5):680–1.
  34. Seminara D, Khoury MJ, O’brien TR, Manolio T, Gwinn ML, Little J, et al. The Emergence of Networks in Human Genome Epidemiology:" Challenges and Opportunities". Epidemiology. 2007;18(1):1–8.
  35. Panagiotou OA, Evangelou E, Ioannidis JPA. Genome-wide significant associations for variants with minor allele frequency of 5% or less—an overview: a HuGE review. Am J Epidemiol. 2010;172(8):869–89.
  36. Regan SL, McFarlane JR, O’Shea T, Andronicos N, Arfuso F, Dharmarajan A, et al. Flow cytometric analysis of FSHR, BMRR1B, LHR and apoptosis in granulosa cells and ovulation rate in merino sheep. Reproduction. 2015;150(2):151–63.
  37. Hu X, Pokharel K, Peippo J, Ghanem N, Zhaboyev I, Kantanen J, et al. Identification and characterization of mi RNA s in the ovaries of a highly prolific sheep breed. Anim Genet. 2016;47(2):234–9.
  38. Sayasith K, Sirois J, Lussier JG. Expression and regulation of regulator of G-protein signaling protein-2 (RGS2) in equine and bovine follicles prior to ovulation: molecular characterization of RGS2 transactivation in bovine granulosa cells. Biol Reprod. 2014;91(6):131–9.
  39. Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet. 2014;15(5):335–46.
  40. Zhu X, Stephens M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat Commun. 2018;9(1):1–14.
  41. Esmaeili-Fard M, Hafezian SH, Gholizadeh M, Arpanahi RA. Gene set enrichment analysis using genome-wide association study to identify genes and biological pathways associated with twinning in Baluchi sheep. Anim Prod Res. 2019;8(2).
  42. Hernández-Montiel W, Martínez-Núñez MA, Ramón-Ugalde JP, Román-Ponce SI, Calderón-Chagoya R, Zamora-Bustillos R. Genome-Wide Association Study Reveals Candidate Genes for Litter Size Traits in Pelibuey Sheep. Animals. 2020;10(3):434.
  43. Makker A, Goel MM, Mahdi AA. PI3K/PTEN/Akt and TSC/mTOR signaling pathways, ovarian dysfunction, and infertility: an update. J Mol Endocrinol. 2014;53(3):R103–18.
  44. Adhikari D, Gorre N, Risal S, Zhao Z, Zhang H, Shen Y, et al. The safe use of a PTEN inhibitor for the activation of dormant mouse primordial follicles and generation of fertilizable eggs. PLoS One. 2012;7(6):e39034.
  45. Tosti E. Calcium ion currents mediating oocyte maturation events. Reprod Biol Endocrinol. 2006;4(1):1–9.
  46. Homa ST. Calcium and meiotic maturation of the mammalian oocyte. Mol Reprod Dev. 1995;40(1):122–34.
  47. Vuguin PM, Kedees MH, Cui L, Guz Y, Gelling RW, Nejathaim M, et al. Ablation of the glucagon receptor gene increases fetal lethality and produces alterations in islet development and maturation. Endocrinology. 2006;147(9):3995–4006.
  48. Ouhilal S, Vuguin P, Cui L, Du X-Q, Gelling RW, Reznik SE, et al. Hypoglycemia, hyperglucagonemia, and fetoplacental defects in glucagon receptor knockout mice: a role for glucagon action in pregnancy maintenance. Am J Physiol Metab. 2012;302(5):E522–31.
  49. Tosca L, Chabrolle C, Dupont J. L’AMPK: un lien entre métabolisme et reproduction? médecine/sciences. 2008;24(3):297–300.
  50. Tosca L, Chabrolle C, Uzbekova S, Dupont J. Effects of metformin on bovine granulosa cells steroidogenesis: possible involvement of adenosine 5′ monophosphate-activated protein kinase (AMPK). Biol Reprod. 2007;76(3):368–78.
  51. Tosca L, Crochet S, Ferré P, Foufelle F, Tesseraud S, Dupont J. AMP-activated protein kinase activation modulates progesterone secretion in granulosa cells from hen preovulatory follicles. J Endocrinol. 2006;190(1):85–97.
  52. Tosca L, Froment P, Solnais P, Ferré P, Foufelle F, Dupont J. Adenosine 5′-monophosphate-activated protein kinase regulates progesterone secretion in rat granulosa cells. Endocrinology. 2005;146(10):4500–13.
  53. Vrtačnik P, Ostanek B, Mencej-Bedrač S, Marc J. The many faces of estrogen signaling. Biochem medica. 2014;24(3):329–42.
  54. Mendoza N, Castro JER, Sánchez Borrego R. A multigenic combination of estrogen related genes are associated with the duration of fertility period in the Spanish population. Gynecol Endocrinol. 2013;29(3):235–7.
  55. Shimada M, Umehara T, Hoshino Y. Roles of epidermal growth factor (EGF)-like factor in the ovulation process. Reprod Med Biol. 2016;15(4):201–16.
  56. Atasoy D, Schoch S, Ho A, Nadasy KA, Liu X, Zhang W, et al. Deletion of CASK in mice is lethal and impairs synaptic function. Proc Natl Acad Sci. 2007;104(7):2525–30.
  57. Ojeh N, Pekovic V, Jahoda C, Määttä A. The MAGUK-family protein CASK is targeted to nuclei of the basal epidermis and controls keratinocyte proliferation. J Cell Sci. 2008;121(16):2705–17.
  58. Oliver KF, Wahl AM, Dick M, Toenges JA, Kiser JN, Galliou JM, et al. Genomic Analysis of Spontaneous Abortion in Holstein Heifers and Primiparous Cows. Genes (Basel). 2019;10(12):954.
  59. Rebecchi MJ, Pentyala SN. Structure, function, and control of phosphoinositide-specific phospholipase C. Physiol Rev. 2000;80(4):1291–335.
  60. Clouthier DE, Passos-Bueno MR, Tavares ALP, Lyonnet S, Amiel J, Gordon CT. Understanding the basis of auriculocondylar syndrome: Insights from human, mouse and zebrafish genetic studies. In: American Journal of Medical Genetics Part C: Seminars in Medical Genetics. Wiley Online Library; 2013. p. 306–17.
  61. Kim D-H, Sarbassov DD, Ali SM, King JE, Latek RR, Erdjument-Bromage H, et al. mTOR interacts with raptor to form a nutrient-sensitive complex that signals to the cell growth machinery. Cell. 2002;110(2):163–75.