Multi-omics data reveal genes that are relevant for muscular amino acids in the common carp

Fish muscular amino acids are a series of essential nutrients that embrace essential amino acids, branched-chain amino acids and avorous amino acids. Previous studies have found that amino acids have important physiological effects on sh growth and development, as they are involved in maintaining nitrogen balance and in the formation of enzymes and hormones. Amino acids, such as aspartic acid, glutamic acid, glycine and alanine, that can have a signicant effect on sh umami taste are called avorous amino acids. Nevertheless, the studies on the genetic mechanisms of amino acid metabolism in the common carp (Cyprinus carpio) are still limited. The purpose of this study was to examine the divergent patterns at the genomic, transcriptomic and epigenomic levels in sh with different amino acid contents. Genome-wide association analysis using 195 individuals of common carp was conducted, and 62 genes were identied to be associated with glycine, proline, and tyrosine content. RNA-Seq of samples with extreme contents of essential amino acids, branched-chain amino acids and avorous amino acids was applied using brain, liver and muscle tissues, resulting in 1,643 differentially expressed genes. Whole-genome bisulte sequencing identied 3,108 genes with differentially methylated promoters. Through the enrichment analysis of transcriptome and DNA methylation results, we screened out a series of enriched pathways associated with amino acid metabolism, including various categories of pathways spanning growth regulation, lipid metabolism, the citrate cycle and other signaling pathways. Integrated studies demonstrated prominent correlations between DGE and DMP for amino acid contents trait in brain and muscle tissues. acid muscular acid results and gene proling with DMPs, we uncovered key genes which were enriched in EAA, BCAA and FLA metabolism pathways. Integrative results of RNA-Seq and DNA methylation indicated signicant correlations at two multi-omics levels. The comprehensive analysis also identied genes related to amino acid synthesis, transportation and utilization. It is desirable to validate these results by extensive sequencing with a larger population. The results will accelerate the application of molecular selection breeding for common carp eshy with high nutritional value.


Results
The purpose of this study was to examine the divergent patterns at the genomic, transcriptomic and epigenomic levels in sh with different amino acid contents. Genome-wide association analysis using 195 individuals of common carp was conducted, and 62 genes were identi ed to be associated with glycine, proline, and tyrosine content. RNA-Seq of samples with extreme contents of essential amino acids, branched-chain amino acids and avorous amino acids was applied using brain, liver and muscle tissues, resulting in 1,643 differentially expressed genes. Whole-genome bisul te sequencing identi ed 3,108 genes with differentially methylated promoters. Through the enrichment analysis of transcriptome and DNA methylation results, we screened out a series of enriched pathways associated with amino acid metabolism, including various categories of pathways spanning growth regulation, lipid metabolism, the citrate cycle and other signaling pathways. Integrated studies demonstrated prominent correlations between DGE and DMP for amino acid contents trait in brain and muscle tissues.

Conclusion
In summary, the multi-omics data revealed candidate genes and pathways correlated with amino acid metabolism. These results will promote the process of the genomic selection and breeding strategy in muscular amino acid contents of sh.

Background
Common carp (Cyprinus carpio) is one of the most widely cultured freshwater shes all over the world. Most C. carpio are cultured in China, with an annual production of around three million tons. There are abundant populations of C. carpio in China, such as Yellow River carp, Songpu mirror carp, Hebao red carp, Xingguo red carp, Oujiang color carp and many hybrid strains [1]. Among these, the Yellow River carp occupies an important position due to its high nutritional value. The composition and content of amino acids are the key factors affecting the nutritional value and taste of the Yellow River carp.
Amino acids have been traditionally divided into essential amino acids (EAAs) and nonessential amino acids (NEAAs) according to whether the organism can synthesize the amino acids in vivo from the food resources. Presently, the de nition of functional amino acids (FAAs) has received increasing attention.
Functional amino acids are those that in uence health, survival, growth, development, cell signaling, immune response and reproduction of the organisms by modulating key metabolic pathways [2]. Arginine, cysteine, leucine, methionine, tryptophan, tyrosine, aspartate, glutamic acid, glycine, proline and taurine have been de ned as FAAs in human nutrition [3]. Amino acids are also divided into branchedchain amino acids (BCAAs, including leucine, valine and isoleucine) and straight-chain amino acids according to their chain shape. BCAAs regulate many key signaling pathways-the activation of the mTOR signaling pathway, for example [4]. In recent years, novel functions of BCAAs have been demonstrated, such as lipolysis and glucose consumption [4]. BCAAs, especially leucine, improve the ability of muscle protein synthesis through insulin-dependent and insulin-independent pathways [5].
Aspartic acid, glutamic acid, glycine and alanine, which have signi cant effects on sh umami taste, are called avorous amino acids (FLAs).
In addition to the effects on meat quality and taste, the composition and content of amino acids are of great biological signi cance [6]. Fish growth is largely regulated by hormones and nutritional levels in a complex regulatory network. However, in this complex regulatory network, growth hormone (GH) and insulin-like growth factors (IGFs) play major roles. Amino acids can directly or indirectly act on the GH/IGF growth axis, thereby affecting sh growth [7]. The α-ketoacid, the catabolic product of amino acid, is regulated by the metabolic pathways of sugar or lipids with different characteristics. The αketoacid can be resynthesized into new amino acids, converted into sugars or fats, or oxidized and decomposed into CO 2 and H 2 O in the tricarboxylic acid cycle with the release of energy [8].
The protein content of most sh is 15%−20% of the total mass, and EAAs account for 20%−50% of the total amino acid content [6]. Researchers studying the fatty acid composition, mineral composition and EAA content of 12 sh species found that the protein content of these sh was 18.7%−25.5%, and the EAA content was 1.93 to 8.73 mg/g. The content of crude protein and fat in teleosts varied widely, and the content of EAAs (35.8% ± 2.6%) only accounted for half the amount of NEAAs (64.2% ± 2.6%) [9]. The quality of food protein is determined by two parameters of amino acids: composition and content. EAAs are obtained from the diet, and each amino acid plays different roles in human metabolism. A recent study demonstrated that differences in sh muscle composition are under genetic control that encodes production of different types of proteins or amino acids [10]. However, due to the complexity of amino acid traits, it is di cult to signi cantly enhance the nutritional composition of muscle by traditional breeding strategies.
The early research was often limited to speci c genes related to amino acid synthesis, but such an approach is insu cient to fully understand the mechanisms and pathways that underlie the traits. With the prompt progress of high-throughput sequencing technology, more research has adopted the methods of resequencing, transcriptome, methylation, proteome and metabolome analyses to explore the molecular network mechanisms of traits from different perspectives. The genome-wide association study (GWAS) on the citrulline metabolism of watermelon identi ed two key candidate genes (ferrochelatase and acetolactate synthase) that were signi cantly related to citrulline content [11]. A multi-locus GWAS model was used to dissect biosynthetic processes of the free amino acids (FAAs) in bread wheat [12]. A total of 15 candidate genes affecting the anabolism of FAAs were identi ed. One candidate gene encoding tryptophan decarboxylase was associated with the content of tryptamine in vitro. A GWAS of amino acid content in oysters obtained 417 signi cant SNPs, and the genetic network analysis found that three node SNP regions were associated with glycogen, protein and aspartate content [13]. Solexa/Illumina RNA-Seq was employed to generate 52.5 million reads of mRNA sequences from Chinese perch fast muscle samples, and 21 amino acid transporter genes were annotated using protein and gene ontology databases [14]. Despite these results from previous studies, the gene regulatory network is complicated and remains obscure due to insu cient multi-omics evidence. Multi-omics studies usually combine two or more types of omics data, such as transcriptomics, genomics, epigenetics, proteomics and metabolomics. In teleosts, some biological phenomena have been uncovered through multi-omics methods. To examine molecular mechanisms of cold tolerance in T. fasciatus, Wen et al. integrated mRNA-Seq and metabolomics of T. fasciatus liver tissue to establish an mRNA-protein−metabolite interaction network associated with cold tolerance; their study identi ed differentially expressed metabolites, genes and proteins involved in membrane transport, fatty acid metabolism and signal transduction [15]. These results indicated that nutrient substances could be added to the fodder to help the overwintering of T. fasciatus. Wei et al. integrated transcriptomic and metabolomic analyses of esh quality in the large yellow croaker; their results showed that feeding dietary hydroxyproline could improve esh quality through changes in comprehensive metabolic pathways, including lipid metabolism, collagen synthesis metabolism and glycolysis [16]. A multi-omics analysis of the dynamics of muscular metabolites in leopard coral grouper that combined transcriptomic and metabolomic methods indicated that branched-chain amino acids played a role in energy production in the sh muscle tissues during the fasting period [17]. Zhang et al. conducted DNA methylation sequencing and comparative transcriptome analyses that uncovered the molecular mechanisms of body color variation in the crucian carp [18].
Integrating the methods of GWAS, transcriptome and methylation analyses, Zhang et al. identi ed candidate genes and pathways associated with fatty acid metabolism in the common carp [19]. Here, we conducted multi-omics analysis of muscular amino acid content in the Yellow River carp using genomic, transcriptomic and epigenomic data. This study demonstrates the potential metabolic mechanisms of amino acids in the Yellow River carp and provides an experimental foundation for improving the meat quality of Yellow River carp by genome-assisted breeding.

Results
Genotyping and phenotyping of C. carpio samples Based on previous work [1,19,20], 199 samples were randomly collected from a cultivated population of C. carpio. The raw genotyping data with 184,978 SNPs for the 199 samples were obtained from SNP genotyping. A total of 95,400 polymorphic SNPs from 195 samples passed the quality control threshold. As shown in Figure 1A and Table S1, the contents of glutamic (GLU), aspartic (ASP), leucine (LEU) and lysine (LYS) accounted for a relatively large proportion of the total amino acid content, and the distribution of the other 16 amino acids was relatively balanced. The gender information of samples was collected and it showed no signi cant differences between males and females across all traits. To discover the potential relationship among multiple traits, the contents of 19 amino acids were analyzed using the correlation heatmap shown in Figure 1B. GLU, ASP, THR, LYS and PHE contents were dramatically associated with each other, possibly indicating similar functions within the amino acid content regulating mechanism.
Genome-wide association analysis In total, 36, 6, and 1 SNPs were identi ed for glycine (GLY), proline (PRO) and tyrosine (TYR), respectively, with a threshold of P < 5.241× 10 -7 (Table S2). The Manhattan plot showed the most promising results for GLY content, while the Q-Q plot indicated the reliability of the data analysis (Figure 2A and B). Similar results for PRO and TYR content are shown in Figure S1. The distributions of SNPs on each chromosome were calculated as shown in Figure 2C, which re ects the relatively even distribution among all of the chromosomes. Genes were annotated through the new common carp genome [21], and 54, 10 and one genes were identi ed for GLY, PRO and TYR traits, respectively.

Transcriptomic analysis of divergent amino acid content
We examined the amino acid contents of 20 newly collected sh samples and selected six individuals with relatively extreme amino acid content. The EAA, BCAA and FLA contents of the 20 samples are presented in Table S3. Taken from our previously published data, the quality of RNA sequencing data is shown in Tables S4 and S5. As EAA and BCAA content were highly correlated, these two categories were integrated as one group in the following analyses.
Plenty of differentially expressed genes were found for the trait EAA (or BCAA), with 236, 81 and 960 genes in brain, liver and muscle tissues, respectively (Figure 3A-C and Tables S6, S8 and S10). The Venn diagram for EAA (or BCAA) trait showed that two, thirteen and eight genes were shared in pairwise comparisons among three tissues ( Figure 3D). The cluster analyses of the differentially expressed genes for EAA (or BCAA) in three tissues are shown in Figures 3E and S2.
For the analysis of the FLA trait, 818, 74 and 3 differentially expressed genes were found in three tissues, respectively ( Figure 4; Tables S6, S8 and S10). Very limited numbers of genes were shared among three the tissues, which could be due to the diversi ed gene functions in different tissues ( Figure 4D). The cluster studies of the differentially expressed genes for FLA in three tissues are shown in Figures 4E and   S2.
The network analyses of differentially expressed genes for the three classi ed amino acids in multiple tissues are shown in Figures S3-7 and Tables S7, S9 and S11. The network of GO enrichment pathways are illustrated in Figures 5, S8, S9 and Tables S12 and S13, while the KEGG enrichment analysis is shown in Table S14.

Differential methylation analysis
The epigenetic difference analyses between the high and low amino acid content groups were conducted and the DNA was isolated from muscle tissues of six samples for WGBS. Taken from our previously reported data, the quality of WBGS data was shown in Table S15. A volcano plot of the EAA (or BCAA) shows the distribution of methylation differences in all the DMRs ( Figure 6A). DMRs were classi ed into several genomic regions including exons, introns, promoters and intergenic repeat regions ( Figure 6B). Figure 6C shows the length distribution of DMRs and most regions were shorter than 1000 bp. Similar results were observed in the DMRs identi cation for the FLA trait ( Figure 6D F). The DMRs within the promoter regions (differentially methylated promoters, DMPs) were chosen for further functional enrichment analysis due to the signi cance of DMPs in the regulation of transcription [22]. The GO enrichment analysis and KEGG enrichment analysis are shown in Tables S17 and S18.

Multi-omics analysis of DGE and DMP results
Through DGE and DMP studies, some related genes and pathways have been found, but the interactions between methylated sites and altered expression are still unclear. Obviously it is considerable to implement a multi-omics analysis through the DGE and DMP aspects. Because very few intersections were observed between the liver DGE and DMP results, we concentrated on the relatedness between DGE and DMP results in brain and muscle tissues (Table S19). After Pearson correlation analysis, signi cant linear correlations (Figure 7) were found in the EAA (or BCAA) and FLA traits of muscle and brain tissues.
These genes were divided into two types, positively related and negatively related. For the EAA (or BCAA) trait in muscle, 57 genes were included in the correlation analysis. In addition, 36 genes of the FLA trait in brain tissues were obtained in the correlation analysis. For the EAA (or BCAA) trait in muscle tissues, 24 positively related genes and 33 negatively related genes were presented in Figure 7A. Totally 12 positively related genes and 24 negatively related genes for FLA trait in brain tissues were found in Figure 7B.

Discussion
Genes associated with muscular amino acid content by GWAS Among the identi ed 36 SNPs associated with GLY content in C. carpio, 32 were located on chromosome 25. Two SNPs (carp102954 and carp102953) were located within the intron and exon regions of the glud1 (glutamate dehydrogenase 1) gene. Glutamate dehydrogenase is a zinc protein that plays a major role in amino acid metabolism in multiple tissues such as liver, heart, muscle and kidney. Glutamate dehydrogenase, as a crucial enzyme in cray sh, was reported to be involved in bridging amino acid metabolism and carbohydrate metabolism, and the enzyme was regulated by reversible phosphorylation in severe hypoxia-tolerant conditions [23]. A recent study in rice revealed that glutamate dehydrogenase mediated amino acid metabolism after ammonium uptake and also enhanced rice growth under aeration conditions [24]. A number of amino acid or peptide transporters were observed in the associated genes list, including abcb10 (ATP-binding cassette sub-family B member 10) and slc1a4 (neutral amino acid transporter A). The gene abcb10 encoded a protein that was involved in heme biosynthesis [25], while slc1a4 encoded a protein that was reported as an important transporter for multiple neutral amino acids [26]. The gene myoz1 (myozenin-1) was identi ed near the downstream area of the SNP carp022265, which was previously reported to function in myo brillogenesis, and may affect myo brillar protein content in muscle [27,28]. Only a few SNPs and genes were identi ed to be associated with PRO and TYR content, and no signi cant markers were found to be associated with other amino acid contents, probably due to the limited number of samples. Nevertheless, dozens of SNPs exceeded the suggestive threshold in other traits, indicating that more associated SNPs might be identi ed with larger samples in the future. As the genes discovered in GWAS analysis may not be enough to illustrate the underlying mechanism of muscular amino acid metabolism, and no associated SNPs were found in the classi ed amino acids (EAA, BCAA, and FLA), we sought to uncover more evidence for the three categories of amino acids by performing gene expression and epigenetic changes analyses.
Network analysis based on RNA-Seq data Both BCAAs and EAAs play crucial roles in the physiological functions of organisms, while the FLAs are the main source of the sh umami and avor. We focused on these three types of classi ed amino acids using RNA-Seq data analysis and gene network construction. A large number of genes were found differentially expressed in muscle for EAA (or BCAA), corresponding to the fact that most EAAs or BCAAs are metabolized in muscle tissue [29]. As thousands of genes were used for the network analyses ( Figures S3-7), it was not easy to identify the most signi cant node genes from the network map. Functional enrichment of genes generated GO and KEGG pathways that played central roles in the regulation of amino acid contents. As shown in Figure 5A, in muscle tissue for EAA or BCAA trait, various pathways were enriched, such as "skeletal muscle cell differentiation," "skeletal muscle tissue development," "striated muscle tissue development," "myotube cell development" and "regulation of protein ubiquitination." KEGG enrichment for EAA or BCAA in muscle resulted in 36 pathways, including "valine, leucine and isoleucine degradation," "mTOR signaling pathway," "fatty acid metabolism" and "alanine, aspartate and glutamate metabolism." Enrichment results for the FLA trait in brain tissue showed a distinct pattern of pathways related to stem cell or hematopoietic stem cell differentiation ( Figure 5B), including "embryonic skeletal system development," "regulation of hemopoiesis," and "stem cell differentiation." A number of neuro-related pathways ("nerve development," "neural crest cell migration," and "positive regulation of neurogenesis") were also observed in the GO enrichment network of FLA, partially because some amino acids (e.g., GLY) within the FLA category served as neurotransmitters [30]. KEGG enrichment for the FLA trait in brain tissue contained 20 pathways comprising "wnt signaling pathway," "neuroactive ligand-receptor interaction," "glycine, serine and threonine metabolism" and several signaling pathways. Only one pathway, "glutathione metabolism," was enriched in liver tissue for the FLA trait, while glutathione is a tripeptide that plays a pivotal role in critical physiological processes affecting amino acid contents [31]. These network results will facilitate further research on the hub genes and pathways that are involved in the regulation of gene expression associated with the relevant traits.

Gene identi cation with DMP and integration incorporating DEGs
It has been widely accepted that genes regulating amino acid metabolism are closely related to other pathways, as many products are shared in the pathways of fatty acid metabolism, growth regulation and other physiological functions. As shown in Table S18, 69 pathways were enriched for the EAA or BCAA trait. Interestingly, several amino acid metabolism related pathways were enriched, such as "lysine degradation," "valine, leucine and isoleucine degradation," "beta-alanine metabolism," and "arginine and proline metabolism." Multiple important biological processes were also screened out by KEGG enrichment, indicating the wide interactions among these physiological functions, such as "pyrimidine and purine metabolism," "insulin signaling pathway," "adipocytokine signaling pathway" and "citrate cycle (TCA cycle)." Similarly, 66 pathways were enriched for the FLA trait (Table S18), which showed comparable patterns with the above results from EAA or BCAA. The widespread methylation differences in promoter regions of selected genes indicated that epigenomic features were common in amino acid metabolism regulation.
Among those interactive results shared by DMP and DGE, genes with more highly methylated promoters and decreased expression levels (or vice versa) were paid more attention due to their multi-omics evidence. Taking the gene agtpbp1 (cytosolic carboxypeptidase 1) for example, the promoter showed a higher methylation ratio in the H group of EAA (or BCAA), corresponding to decreased expression ( Figure  7A). Likewise, for the integrated analysis of the FLA trait, the gene gpt2l (alanine aminotransferase 2-like) was screened out with decreased methylation ratio and elevated expression in the H group. The protein alanine aminotransferase 2-like catalyzes the reversible transamination between alanine and 2oxoglutarate to form pyruvate and glutamate, engaging in metabolism of multiple amino acids.

Conclusions
In this study, we conducted the GWAS,transcriptome and whole genome methylation analysis on muscular amino acid content in C. carpio. The GWAS investigation of amino acid content revealed some genes that are related with amino acid metabolism. Through studies of the RNA-Seq results and gene pro ling with DMPs, we uncovered key genes which were enriched in EAA, BCAA and FLA metabolism pathways. Integrative results of RNA-Seq and DNA methylation indicated signi cant correlations at two multi-omics levels. The comprehensive analysis also identi ed genes related to amino acid synthesis, transportation and utilization. It is desirable to validate these results by extensive sequencing with a larger population. The results will accelerate the application of molecular selection breeding for common carp eshy with high nutritional value.

Sample collection
A total of 199 individuals of Yellow River carp were randomly collected from Henan Academy of Fishery Sciences, Henan, China. Based on statistical power calculation by GCTA [32], these samples could be used for reliable GWAS analysis. All shes were euthanized in MS222 solution before the samples were collected, and processing was nished within several minutes for each sh. For each individual, the blood sample was taken for molecular analysis. For amino acid quanti cation, dorsal muscle tissue was obtained for amino acid content determination. Brain, liver and muscle tissues from 20 samples were collected and used for RNA extraction.

SNP genotyping
The genomic DNA of 199 samples was extracted from the blood for SNP analysis. The quality and concentration of DNA were examined by agarose gel electrophoresis and colorimetry. The genotyping and quality control were conducted following the previously reported procedures [19].
Genome-wide association analysis The TASSEL version 5.0 software [33] was applied for genome-wide association analysis of amino acid content using the genotyping data. The signi cant P-value threshold was set to 5.241× 10 -7 , and the SNPs associated with amino acid content were selected. The Manhattan plots, Q-Q plots and SNP density were produced by the CMplot package.
Transcriptome sequencing and differential expression analysis Brain, liver and muscle tissues of 20 sh were used to perform the RNA sequencing based on the three respective traits EAA, BCAA and FLA contents under RNase-free conditions. According to the EAA, BCAA and FLA contents, three sh with the overall lowest amino acids content were selected as the control group, and correspondingly three sh with the highest contents were selected as the experimental group. The total RNA was isolated, and the RNA quantity was veri ed using 1.5% agarose gel. A cDNA library was constructed and sequenced through an Illumina HiSeq2500 sequencing system (Illumina, San Diego, CA, USA) and 150 bp paired-end reads were produced.
Low-quality reads and adaptors were removed through Trimmomatic v.0.32 [34]. A Bowtie index was built by Bowtie2 v.2.3.4.2 [35] and Tophat2 was used to aligned transcriptome data to the genome sequence [36]. Samtools [37] was applied to index the bam les, and Cu inks [38] was applied to assemble individual transcripts from RNA-Seq data. Individual transcripts were then merged using Cuffmerge to generate an integrated assembly. Based on FPKM, differentially expressed transcripts were identi ed using Cuffdiff. The differentially expressed genes were inferred from a volcano plot that showed the overall distribution using the R software. The genes with log 2 fold-change >1 and the q value < 0.05 were identi ed as signi cant expression abundance. The heatmaps and Venn diagram were constructed by the pheatmap package and VennDiagram package, respectively. The relationships between Gene Ontology (GO) pathways enriched by differentially expressed genes were demonstrated by the Cytoscape software [39].
DNA methylation sequencing and differential methylation analysis The genomic DNA was used for whole genome bisul te sequencing (WGBS). The genomic DNA was sonicated with sodium sul te, which rst converted unmethylated cytosine to uracil and then to thymine [40]. WGBS was conducted according to previously reported method [19]. Bismark [41] was applied to align the methylation data to the reference genome, and swDMR [42] was used to analyze the differentially methylated regions (DMR) through a sliding window method. The process of DMR detection and annotation followed the protocols that were previously reported [15,19]. The correlation analysis of DEG and DMP was performed using the R package ggplot2.     The network of signi cantly enriched GO terms in muscle and brain tissue for EAA (or BCAA) content and FLA trait. The thickness of the edge between the GO terms represents the Kappa Score between the terms.

Abbreviations
The q-value of the GO terms is manifested through the color of each node. The size of each node shows the number of enriched genes for every GO term. The GO terms with q-value < 0.05 are displayed.  Pearson's correlation of gene expression fold changes and differential methylation rates for DEGs. Scatter plots (A) for EAA (or BCAA) trait in muscle tissues; (B) for FLA trait in brain tissues. The X-axis means difference in methylation ratio between high and low EAA (or BCAA) content groups. The Y-axis indicates the logarithm of fold changes (log2FC). The red dots indicate the negatively correlated genes, while the blue dots represent positively correlated genes.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. NC3RsARRIVEGuidelinesChecklist.pdf Supplementarytables0428.xlsx