Large-Scale Metadata Analysis of Ovary Based Multi-Omics Datasets for Understanding the Genes Regulating Litter Size

Background : Litter size is a very important production index in the livestock industry, which is controlled by various complex physiological processes. To understand and reveal the common gene expression patterns involved in controlling prolificacy, we have performed a large-scale metadata analysis of five genome-wide transcriptome datasets of pig and sheep ovary samples obtained from high and low litter groups, respectively. We analyzed separately each transcriptome dataset using GeneSpring v14.8 software by implementing standard, generic analysis pipelines and further compared the list of most significant and differentially expressed genes obtained from each dataset to identify genes that are found to be common and significant across all the studies. Results : We have observed a total of 62 differentially expressed genes common among more than two gene expression datasets. The KEGG pathway analysis of most significant genes has shown that they are involved in metabolism, the biosynthesis of lipids, cholesterol and steroid hormones, immune system, cell growth and death, cancer-related pathways and signal transduction pathways. Of these 62 genes, we further narrowed the list to the 25 most significant genes by focusing on the ones with fold change >1.5 and p<0.05. These genes are CYP11A1, HSD17B2, STAR, SCARB1, IGSF8, MSMB, SERPINA1 , FAM46C, HEXA, PTTG1, TIMP1, FAM167B, CCNG1, FAXDC2, HMGCS1, L2HGDH, Lipin1, MME, MSMO1, PARM1, PTGFR, SLC22A4, SLC35F5, CCNA2, CENPU, CEP55, RASSF2, and SLC16A3 . Conclusions : Interestingly, comparing the list of genes with the list of genes obtained from our literature search analysis, we found only three genes in common. These genes are HEXA, PTTG1, and TIMP1. Our finding points to the potential of a few genes that may be important for ovarian follicular development and oocyte quality. Future studies revealing the function of these genes will further our understanding of how litter size is controlled in the ovary while also providing insight on genetic selection of high litter gilts.


Abstract
Background : Litter size is a very important production index in the livestock industry, which is controlled by various complex physiological processes. To understand and reveal the common gene expression patterns involved in controlling prolificacy, we have performed a large-scale metadata analysis of five genome-wide transcriptome datasets of pig and sheep ovary samples obtained from high and low litter groups, respectively. We analyzed separately each transcriptome dataset using GeneSpring v14.8 software by implementing standard, generic analysis pipelines and further compared the list of most significant and differentially expressed genes obtained from each dataset to identify genes that are found to be common and significant across all the studies.
Results : We have observed a total of 62 differentially expressed genes common among more than two gene expression datasets. The KEGG pathway analysis of most significant genes has shown that they are involved in metabolism, the biosynthesis of lipids, cholesterol and steroid hormones, immune system, cell growth and death, cancer-related pathways and signal transduction pathways. Of these 62 genes, we further narrowed the list to the 25 most significant genes by focusing on the ones with fold change >1. 5  Conclusions : Interestingly, comparing the list of genes with the list of genes obtained from our literature search analysis, we found only three genes in common. These genes are HEXA, PTTG1, and TIMP1. Our finding points to the potential of a few genes that may be important for ovarian follicular development and oocyte quality. Future studies revealing the function of these genes will further our understanding of how litter size is controlled in the ovary while also providing insight on genetic selection of high litter gilts.

Background
Increasing litter size and raising healthy, quality animals are crucial factors in the livestock industry [1]. Understanding the genetic traits controlling the reproductive physiology has become an important research topic in the last decade [1]. Litter size is a complex trait in multiparous mammals.
It is dependent on ovarian development, ovulation rate, placental health, uterine capacity, embryonic and foetal survival rate, many of which affect the availability and quality of the oocyte [2,3]. Pigs (Sus scrofa) are one of the most important livestock. In addition, due to its similarities with humans, pigs are highly used as a model organism in human medical research. For more than 150,000 years, pigs have been considered one of the most highly divergent species. To date, 12 Sus scrofa subspecies have been reported worldwide with 142 different breeds, respectively. Although they have different meat production performance, previous studies have reported the Chinese Meishan breed of having an average litter size of 14.3 piglets, the Iberian breed having an average of 7 piglets per litter and the Berkshire breed with an average of about 8.9 piglets per litter [2]. Thus, it is desirable to develop a highly reproductive breeding line using an efficient set of prolificacy-related genes.
Metadata is the data associated with a corresponding article which is generally provided as supporting or additional data required for drawing valuable conclusions. Development of public repositories such as NCBI Gene Expression Omnibus (NCBI GEO), ArrayExpress, and many more are  [4]. This study has majorly reported genes encoding the immune system, maternal homeostasis and fatty acid metabolic enzymes involved in the steroidogenesis pathway [4]. Amanda et al. (2011) have also identified 27 differentially expressed genes which were found to be co-localized with quantitative trait locus (QTL) of litter size traits [4].
Another study compared gene expression patterns of ovarian follicles of Chinese Taihu (highly prolific sows) and large white low litter sows, and identified 133 differentially expressed genes that function in development and signal transduction processes [5]. Zhang et al. (2015) also performed a genomewide expression analysis to reveal the gene expression differences between high and low litter size in Yorkshire pigs [6]. A similar study that was performed in sheep reported a total of 1252 genes that were differentially expressed in Hertian sheep (low litter) compared to Qira sheep (high prolificacy).
The KEGG pathway analysis of these genes found them to be associated with steroid biosynthesis, steroid hormone biosynthesis, TGF-β, insulin, Wnt, Notch, and other signaling pathways [7]. These studies provide first-line screening data on litter size-related gene expression in various breeds and conditions. However, further analysis is required to extract the most meaningful information from this vast amount of data. We hypothesize that large-scale metadata analysis of these datasets will allow comparison of ovarian litter size-related gene expression across different species, breed and conditions to ultimately help identify commonly observed genes among these datasets. This may provide insight on potential novel gene(s) and/or pathway(s) to focus on for understanding regulation of ovarian follicular development and competent oocyte availability.  (Table-1). We have also retrieved the list of genes and proteins which were found to be involved in regulating the litter size in pig, mouse and sheep from UniProt and NCBI databases, respectively. A total of 96 genes related to litter size distributed among Sus scrofa, Capra hircus, Mus musculus, Ovis aries were retrieved from UniProt databases. From Harmonizome databases, (https://amp.pharm.mssm.edu/Harmonizome/) we have retrieved three gene sets: abnormal (270 genes), increased (11 genes) and decreased (260 genes) litter size gene sets. We further compared all the retrieved genes above and tried to retrieve the common genes among all the gene sets.  [7]. All the datasets mentioned above were retrieved based on their experimental study and design as they focused on understanding the differentially expressed genes in ovarian samples grouped as high and low litter experimental subjects of pigs and sheeps, respectively. The GSE21383 and GSE23985 datasets were imported into GeneSpring® version 14.8 software using "Import experiments from NCBI GEO" wizard. The samples were log transformed, baseline transformed and normalized using the RMA normalization method.

Methods
The experimental grouping information obtained from the available metadata was used to group the samples. All the samples were then subjected to quality control analysis using a filter based on expression values using the standard (100-higher percentile and 20-lower percentile). After quality control analysis, we performed a statistical analysis using a moderated t-test with asymptotic p-value method and Benjamini Hochberg multiple testing correction method for p-value correction. Pathway Enrichment Analysis using KOBAS: As obtained from our metadata analysis, the five datasets showed significant and differentially expressed genes and were further subjected to pathway enrichment analysis using KOBAS web software (http://kobas.cbi.pku.edu.cn/kobas3) [8]. We have 17β-estradiol (E2), progesterone, activin A, etc. [9]. Earlier studies have also reported that inactive homozygous mutations in transforming growth factor β, BMP15, and GDF9 superfamily genes leads to decreased ovulation rates which further leads to sterility in dairy goat [10,11] (Supplementary Information-S1). Previous studies have reported that proteins such as insulin growth factor 1 (IGF1), oviductin (OVGP1), tissue inhibitor of metalloprotein (TIMP1), uteroglobin, leptin and plasminogen activator inhibitor 1 (PAI1) were reported to contribute to sperm capacitation, gamete fertilization and the facilitation of the entrance of the embryo into the uterus [12][13][14][15][16][17]. The genetic polymorphisms of the steroid hormone-related genes such as progesterone receptor (PR), estrogen receptor (ER), and steroid receptor co-activators (SRC1 and SRC2) were found to be associated with the risk of implantation failure [18][19][20][21]. Similarly, prostaglandins play an important role in various reproductive processes such as ovulation and implantation [22,23]. We have also retrieved the MPO genephenotype associated gene sets [24,25] encoding for abnormal, increased and decreased litter size from Harmonizome databases [26]. We have retrieved a total of 270 genes, which were common among the abnormal, increased and decreased litter size gene sets. Interestingly, when we compared the list of litter size-regulating genes of C. hircus, S. scrofa and M. musculus retrieved from UniProt and NCBI databases, 26 genes were found to be commonly identified in these studies. These genes ( Fig. 1). From the list of significantly expressed genes, we further narrowed down the list to include only genes with a fold change FC > 1.5. The distribution of those highly significant and differentially expressed genes is shown in Fig. 1. In addition, the significantly differentially expressed genes were further listed as up-or down-regulated genes. Their full gene names and fold change information are detailed in Table-2A and Table-2B, respectively.

Pathway Enrichment Analysis
We have performed the pathway enrichment analysis separately using the up-and down-regulated list of genes obtained from the metadata analysis. A total of 83 and 69 pathways were obtained for the up-and down-regulated genes, out of which 38 and 49 pathways were found to be significant (pvalue < 0.05), respectively (Table 4A and

Discussion
In this study, we have retrieved a list of genes found to be involved in both positive and negative regulation of litter size in S. scrofa, O. aries, Mus musculus, Rattus norvegicus and Capra hircus from UniProt and NCBI repositories. Earlier studies have reported that the reproductive system of animals is regulated by an arsenal of hormones [5]. The litter size gene list retrieved from public repositories majorly represented genes encoding for hormones. Interestingly, when we compared the list of litter size-regulating genes of C. hircus, S. scrofa and M. musculus retrieved from public repositories, it resulted in a total of 26 genes among the compared list of genes. Studies conducted in the past have mainly reported the involvement of the commonly observed genes mentioned above to be involved in regulation of litter size. We have also retrieved and compared the decreased, increased and abnormal litter size gene sets from Harmonizome databases. In comparison, all of the genes obtained from our literature analysis and differentially expressed genes obtained from our metadata analysis have shown 3 genes in common, which are HEXA, PTTG1 and TIMP1. Hexosaminidase-α (HEXA) and its isozymes HEXB and HEXS together have the capacity to breakdown a variety of substrates such as G M2 gangliosides, glycolipids, glycosaminoglycans and glycoproteins, which for the most part contain β-linked N-acetylglucosamine and N-acetyl galactosamine residues [27]. According to Juneja (2002), the hexosaminidase knockout mice exhibited reduced fertility at a young age, which progressively decreased with increased age and ultimately lead to infertility [27]. However, the HEXB knockout mice were found to develop normally and be fertile during the early stages of development. These results indicate that hexosaminidase is not required for sperm-ovum interactions and fertilization [27]. PTTG1 gene encodes for pituitary tumor-transforming gene, an oncogene which is found to play a key role in cell cycle regulation and sister chromatid separation [28]. Previous studies have reported that PTTG1 is highly expressed in various tumors, especially ovarian tumorigenesis. However, the exact involvement of PTTG1 in fertilization and with respect to litter size has not yet been reported. TIMP1 gene encodes for tissue inhibitor of metalloproteinase, it is found to play a key role in ovulation. Rosewell et al. (2013) demonstrated that gonadotropin-induced increase in TIMP protein in human periovulatory follicles could help regulate the follicular extracellular matrix and other TIMP associated processes with ovulation with an increase in TIMP inhibitor [29].

Biosynthesis and Metabolism of Lipids, Cholesterol and Steroidogenesis
Genes involved in steroid biogenesis and ovarian steroidogenesis pathways such as Cyp11A1, Msmo1, Star, Dhrs7/Hsd17b7, and Hmgcs1 were up-regulated in high litter group samples. Cyp11A1, also known as cholesterol side-chain cleavage enzyme, is a mitochondrial enzyme which helps with the biosynthesis of various steroid hormones [30] According to Gharani et al. (1997), allelic variants of Cyp11a1 might cause hyperadrogenaemia which may further lead to changes in ovarian morphology [31]. This study also reported that Cyp11a plays a significant role in the progression of hirsutism in polycystic ovary syndrome conditions [31]. Msmo1 gene is also involved in the biosynthesis of steroids and the production of zymosterol from lanosterol. The gene expression studies conducted by role by providing retinoic acid, and that a lack of Aldh1a1 in fetal ovaries lead to delayed germ cell meiosis [33]. Dhrs7, also known as 17β-hydroxysteroid dehydrogenase, is involved in the conversion of estrone (E1) to estradiol (E2) and is highly expressed in the ovaries of pregnant animals. According to Pasi et al. (2000), spatial and temporal expression of Dhrs7/Hsd17b7 in the uterus indicates that locally-produced estradiol plays a crucial role in implantation [34]. Faxdc2 (fatty acid hydroxylase domain 2), which is involved in cholesterol synthesis, had megakaryopoiesis highly up-regulated in all high litter samples [35]. Lipin1 is a central metabolic regulator found to play an important role in lipid metabolism, especially the glycerolipid and glycerophospholipid metabolic pathways. Lipin1 gene and its polymorphisms were also found to be involved in the development of PCOS. Gowri et al. (2007) reported that Lipin1 is down-regulated by estradiol in the uterus and liver, and that the expression levels of Lipin1 is low and compromised in mouse models with diabetes and/or reduced fertility [36].
Earlier studies have reported that Lipin1 deficient mice (fld/fld) have less body fat and exhibit symptoms of diabetes and impaired fertility [37][38][39]. Regulation of Lipin1 by estrogen plays a critical role between reproduction, growth and metabolism [40].

Genes exhibiting a role in the progression of polycystic ovary syndrome
The above obtained list of differentially expressed genes were found to be play a minor to major role in the development and progression of PCOS. Previous studies have reported that genes encoding for

Genes involved in the immune response, cancer, cell growth and death-related pathways
Genes involved in immune responsive pathways such as CD55 (decay-accelerating factor for complement-55) and OAS1 (2'-5'-oligoadenylate synthetase 1) were highly up-regulated in high litter group samples. OAS1D is a cytoplasmic protein expressed in growing oocytes and early embryos (Wei

Conclusions
In our present study, we have conducted a large-scale metadata analysis of genome-wide multi-omic datasets of pig and sheep in hopes of better understanding the factors affecting litter size. To the best of our knowledge, this is the first study to report on the metadata analysis of gene expression studies focused on litter size for understanding and revealing common gene expression patterns regulating fecundity. We found that 62 genes were significant and differentially expressed between the high and low proliferating groups of sows and sheep. KEGG pathway analysis of these significant genes indicates that some of these genes are involved in the metabolism of lipids and cholesterol, steroid hormone biosynthesis, especially ovarian steroidogenesis, immune system pathways, and cancer overview pathways. A literature search of 42 highly expressed significant genes among high prolificacy sows has revealed that these genes are involved in fertilization, implantation and embryo development. Results obtained in our study have also proposed that genes which are involved in the progression of PCOS were also found to exhibit a similar pattern in low prolificacy groups. From what we know, our study is the first attempt made to understand the common gene expression patterns between high and low prolificacy groups. Our present study provides highly significant genetic information that might contribute to a better understanding of the molecular mechanisms involved in high and low prolificacy variations.     Table 3 not provided with this version.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. SupplementaryInformation.xlsx