Transcriptome Analysis of Tailns From Grass-Goldsh And Egg-Goldsh to Identify Genes Involved in Articial Selection for Ornamental Purposes

Goldsh, one of the rst animals domesticated for ornamental purposes, experience huge morphological variation, of which the number changes in tailns and the loss of dorsal ns is the most impressive due to their important functions in sh. In the present study, a transcriptome analysis of the tailns of Grass- (with complete dorsal n and a single tailn), and Egg-goldsh (loss of dorsal n and double tail ns) was carried out to determine the sequence variants, to detect differential gene expression patterns in transcriptomes, and to characterize the effects of selection and identify target genes under selection. In total, 124,808,475 and 101,965,939 high quality reads were obtained from the tailns of Grass- and Egg-goldsh, respectively, and 114,796 unigenes were further assembled from over 33.48 Gb nucleotides. A large portion of unigenes related to various primary and secondary metabolite pathways were identied, and the differentially expressed genes (DEGs) between the tailns from Grass- and Egg-goldsh were also generally enriched in the metabolite pathways, for instant, the PI3K−Akt signalling pathway, the MAPK signalling pathway, osteoclast differentiation, and the dorso-ventral axis formation, all relating to cell proliferation, growth, and differentiation. These identied DEGs (HOXA2b, HOXB13a, paired mesoderm homeobox protein 1-like isoform X2 (PRRX2), zinc nger E-box-binding homeobox 1-like isoform X3 (ZEB1), and homeobox protein (Meis1), presumably played an important role in the development of the double tail ns during the articial selection for ornamental purposes. group, more than 73% unigenes were distributed in the “PI3K − Akt signalling pathway (1147 unigenes)”, “MAPK signalling pathway (857 unigenes)”, “Rap1 signalling pathway (809 unigenes)”, and “Ras signalling pathway (656 unigenes)”. Phosphatidylinositide-3-kinases


Introduction
Over the past 13,000 years of human history, domestication of all kinds of animals and plants has been fundamental.
The successfully domesticated animals and plants not only provided for the development of human society in the necessary material foundation (e.g., clothing, food, and shelter), some domestic animals have also become the assistants and companions of human daily life (Diamond 2002). Compared with their wild ancestors, some domestic animals show huge differences in morphological characteristic such as gold sh and crucian carp (Smartt 2001;Wang 1985; Wang 2000). Understanding the genetic basis of phenotypic variation, especially identifying target genes under anthropogenic selection during domestication, can provide insight into the processes of rapid evolution and improvement (Diamond 2002).
Gold sh (Carassius auratus) belong to the Cypriniformes, Cyprinidae, and Carassius family, subfamily and genus, respectively. From unconscious breeding to the conscious anthropogenic selection for ornamental purposes, the morphological characteristics of the gold sh have experienced a huge variation compared with the wild crucian carp in less than 1, 000 years of being domesticated (Wang 1985). Long-term arti cial selection for ornamental purposes makes the external morphology of gold sh (eye, n, body shape, scales, etc.) considerably different from that of crucian carp, such as egg-shaped bodies, celestial or telescopic eyes, fancy tail ns, lionhead morphotypes, double tail ns, no dorsal n, inter alia (Komiyama et al. 2009). Of all the morphological variations that have appeared during the domesticated history from wild crucian carp to gold sh, the number changes in the tail ns and the loss of dorsal ns has attracted general concern due to their important functions (Smartt 2001;Wang 2000). The biological function of the sh dorsal n is to maintain the balance of the body while swimming and most shes cannot stay upright without the dorsal n (Drucker et al. 2001). The dorsal nlessness appeared after the attainment of double tail ns, which could compensate for the loss of dorsal ns (Wang et al. 2013). Depending on the condition of the dorsal (retained or loss), and tail (single or double) ns, two terms are used to designate the breeds: Grass-gold sh and Egg-gold sh (Wang 2000). The morphology of the Grass-gold sh (slender-shaped body with complete dorsal n and a single tail n; Fig. 1a) is less derived and more similar to the native Carassius than other breeds (Wang 1985; Wang 2000). The Egg-gold sh (Fig. 1b) always has an egg-shaped body and double tail ns, while their dorsal ns were lost in the process of arti cial selection (Wang 1985;Wang 2000).
These morphological differences, caused by arti cial selection for ornamental purposes, will leave traces on the genetic material (Diamond 2002). Analysis of the mitochondrial DNA sequences reveals more genetic and nucleotide diversity for Grass-gold sh comparing to Egg-gold sh, indicating that the Grass-gold sh could have been the rst domesticated breed and the Egg-gold sh is likely to have a more recent origin (Komiyama et al. 2009;Wang et al. 2013). Further studies on the function of nuclear genes have shown that the homeobox genes play an important role in regulating the occurrence and development of ns. The expression of HoxA11b and HoxA13b genes in the growth and regeneration of pectoral and tail ns reveals that they are involved in the differentiation of sh osteoblasts (Shao et al. 2009). Additionally, research on the function of HoxB1b from Megalobrama amblycephala indicated it was closely related to the development of the pectoral n on the anterior and posterior axes of the embryo (Pu et al. 2004). After introducing HoxD13 protein into zebra sh ns, the formation of new cartilage tissue and a decrease in n tissue in zebra sh embryos were induced (Freitas et al. 2012 In the present study, a comprehensive analysis based on transcriptome sequencing from the tail ns of Grass-gold sh and Egg-gold sh was implemented to determine the sequence variants and detect differentially expressed genes (DEGs) involved in the development of double tail ns, and to identify the target genes arti cial selected for ornamental purposes. In addition, the simple sequence repeats (SSR) loci and markers in the transcriptome from the tail ns were detected for further assistance in molecular marker-assisted breeding of gold sh. Through these extensive transcriptome analyses, we anticipate some progress in elucidating the regulatory effect of DEGs in tail n variations caused by anthropogenic selection for ornamental purposes.

De novo assembly and annotation
We abandoned the adapter sequences, reads with low quality (de ned as reads containing > 50% bases with Q value ≤ 10), or more than 5% unknown ('N') nucleotides to obtain the cleaned reads from the raw data. Subsequently, Trinity  We used the R package, DEseq2 (Anders and Huber 2010), to perform the differential expression analyses between Grass-gold sh and Egg-gold sh. Using negative binomial generalized linear models, DESeq2 provides methods to test for differential expression and to estimate the variance-mean dependence. We used the log 2 values of normalized mean counts to represent fold changes of transcript expression levels, and the FDR (False Discovery Rate) value was calculated to adjust the P value. A signi cantly differential expression was distinguished by setting the threshold with the log 2 (fold change) absolute value ≥ 2, and FDR value ≤ 0.05. For the visualization analyses of differential expression between Grass-gold sh and Egg-gold sh, a volcano plot was obtained by taking log 2 (fold change) absolute value as the horizontal coordinate and the negative log 10 (P value) as the ordinate.
Enrichment analyses of differentially expressed genes were respectively performed by using the software Goatools Clustering analysis of the expression patterns for 2946 DEGs between Grass-gold sh and Egg-gold sh were implemented by using the R package heatmap2 (with Spearman's correlation coe cient for different individuals and Pearson correlation between unigenes) to infer the functional similarities between the DEGs.

Simple sequence repeats (SSRs) analysis
The MISA program (Beier et al. 2017) was used to identify the SSRs in the unigenes (length > 1000 bp) of gold sh, and to design the primers for the predicted SSRs. Furthermore, the numbers of the predicted SSRs (mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides) were calculated and revealed by a histogram.

Transcriptome sequencing and de novo assembly
Three biological replicate libraries were generated and sequenced from the tail ns of Grass-gold sh and Egg-gold sh, respectively (Table 1), and all the raw reads were deposited in the SRA with accession number PRJNA666023. A total number of 54,123,840, 52,650,612 and 54,797,604 raw reads were respectively obtained from the three Grass-gold sh specimens. Three Egg-gold sh specimens separately produced 45,551,450, 49,700,170 and 50,184,718 raw reads. The adaptors, the reads with more than 5% N content, or the bases with lower quality (de ned as reads containing > 50% bases with Q value ≤ 10) were removed. After quality control, approximately 18.40 and 15.08 Gb clean nucleotides in total were obtained for Grass-gold sh and Egg-gold sh, respectively.  In the GO category, 1,530 unigenes were classi ed into three primary GO terms, "biological processes", "cellular components", and "molecular function", and further subdivided into 56 secondary-class terms (Fig. 3). In the GO term "biological processes", "cellular process (1045 genes)", "single-organism process (829 genes)", and "metabolic process (803 genes)", were the top three, secondary-class terms with the most unigenes. The secondary categories "cell part (883 genes)", "cell (883 genes)", and "organelle (640 genes)", were the top three secondary categories in the GO term "cellular components". In the "molecular function" GO term, the secondary-class terms "binding (820 genes)", and "catalytic activity (586 genes)" occupied the top two positions with the most unigenes.

Analysis of the DEGs (differentially expressed genes)
To identify the DEGs between the Grass-gold sh and Egg-gold sh, a paired comparison of the expression quantity for each unigene in the tail n of the different breeds was conducted and the unigenes with fold change ≥ 2 and FDR < 0.01 were ltered (Table 3). A total of 7516 DEGs were detected, out of which 2946 unigenes demonstrated substantially differential expression in the tail ns between Grass-gold sh and Egg-gold sh. In these differentially expressed genes, compared with Egg-gold sh, 1922 unigenes were up-regulated and 1024 unigenes were down-regulated in Grassgold sh. The differentially expressed unigenes from tail ns of Grass-gold sh vs. Egg-gold sh were further mapped and classi ed Large numbers of DEGs were also mapped to various primary and secondary metabolite pathways (e.g., ko01100, ko04151, ko04010, and ko04014) in the enrichment analyses of DEGs between the Grass-and Egg-gold sh KEGG pathways (Table S1 and  were up-regulated (Table S1), while the interferon gamma receptor 1-2 (TRINITY_DN94125_c1_g2), fos-related antigen 2-like (TRINITY_DN85588_c0_g1), interleukin-1 beta-2 (TRINITY_DN104072_c0_g1), fos-related antigen 1-like isoform X2 (TRINITY_DN76364_c0_g1), retinoic acid receptor responder protein 3-like (TRINITY_DN16362_c0_g1), and suppressor of cytokine signalling 1-like (TRINITY_DN83317_c0_g1) were down-regulated (Table S2) in the tail n from Grass-gold sh.

Discussion
Compared with the wild crucian carp, the variation in the tail and dorsal ns is the most attractive, owing to its important biological function in sh. In the present study, based on RNA-Seq technology, transcriptome analyses of tail ns from Grass-gold sh (single), and Egg-gold sh (double) were performed. After quality control, approximately 77.25% and 70.11%clean reads were obtained for Grass-gold sh and Egg-gold sh, respectively. The overall GC contents of the Grass-(42.78%) and Egg-gold sh (43.47%) transcriptomes were both slightly lower than that of AT, which was consistent with the 42.9% GC content in the mitochondrial genome of the wild crucian carp (Ge et al. 2020).
Subsequently, 114,796 unigene sequences, with the total length of 89,545,010 bp, were de novo assembled from the clean reads, and the proportion of reads that could map back to the assembled transcript was 76.68% for Grass-gold sh, and 78.94% for Egg-gold sh. The average (780 bp), N50 (861 bp), and largest (18, 054 bp) length of unigenes were also calculated and ensured reliable transcriptome data.

Conclusion
In total, 124,808,475 and 101,965,939 high quality reads were obtained from the tail ns of Grass-and Egg-gold sh, respectively, and 114,796 unigenes were further assembled from over 33.48 Gb nucleotides. A large portion of unigenes related to various primary and secondary metabolite pathways were identi ed, and the DEGs between the tail ns of Grass-and Egg-gold sh were also generally enriched in the metabolite pathways, for instant, the PI3K−Akt signalling pathway, the MAPK signalling pathway, osteoclast differentiation, dorso-ventral axis formation, all of which relate to cell proliferation, growth, and differentiation. The DEGs that were identi ed, including HOXA2b, HOXB13a, paired mesoderm homeobox protein 1-like isoform X2 (PRRX2), zinc nger E-box-binding homeobox 1-like isoform X3 (ZEB1,) and homeobox protein Meis1, probably played an important role in the development of the double-tailed ns during arti cial selection for ornamental purposes. The results of the present study provide key information on the candidate genes that potentially regulate double-tailed n development, and would facilitate marker-assisted selection and breeding of gold sh.

Declarations
The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. All data generated or analysed during this study are included in this published article (and its supplementary information les).

Funding
This study was supported by the National Natural Science Foundation of China (Grant Nos. 31402288).

Competing interests
The authors declare no competing interests.