A Cluster of Genes at the Terminal Region of Chromosome 1 Contributes to Silk Yield in Bombyx Mori.


 Background:

The silk yield is the most economically important trait in sericulture. In the domestication of the silkworm, traits related to silk yield are the main selective targets. In previous studies, linkage and association analyses were performed, which narrowed the region of the locus controlling the cocoon shell weight (CSW), an important index of silk yield, to the interval of 0–134 kb on the 1st chromosome.
Results

In this study, selection pressure analysis showed that most of the loci in this interval were co-selected during silkworm domestication. Association analysis conducted with 95 strains found that there is a 110 kb region that was significantly associated with CSW. An expression profile correlation analysis of eight genes in the region showed that their expression patterns were very similar. Sequence variation analysis detected mutations in the exon regions of BMgn002066 and BMgn002073, leading to premature termination of translation. The sequence variation was significantly correlated with CSW. In addition, another two genes, BMgn002067 and BMgn002071, had significantly different expression levels in the posterior silk glands of 11 high-yield strains and 11 low-yield strains. Therefore, we speculate that the abovementioned four genes and the gene BmAbl1 detected in previous studies may co-regulate CSW.
Conclusions

This study is the first to identify the gene clusters that affect CSW by forward genetics, thereby providing new evidence for gene clusters that regulate quantitative traits.

Similar to the yield traits of most other species, silk yield carries the genetic signature of quantitative traits and is regulated by multiple genes [5]. Researchers have used omics analyses such as transcriptome and proteome [6][7][8][9][10] and forward genetics to explore the genes that regulate silk yield.
However, because the silkworm has complex genetic characteristics such as female complete linkage and other factors that could affect yield-related indicators, it is di cult to formulate suitable mapping materials and methods in forward genetic studies, making it di cult to research the genetic basis of silk yield. Therefore, over the past 100 years, only a few major genes have been detected, including BmGlcNase1 and BmAbl1 [11,12]. Since the inheritance of silk yield traits is dominated by additive effects [13] and the few identi ed genes have limited effects on molecular breeding, it is necessary to identify additional silk yield controlling genes.
In the previous work, our research group used linkage analysis to circumvent the effects of gender in order to locate a quantitative trait loci (QTL) at 0-134k of the 1st chromosome (Chr1) and found that BmAbl1 (BMgn002070) was signi cantly related to the CSW in this region [11]. In the course of this research, we preliminarily found that the mapping region seemed to have been selected as a whole unit.
In the classical genetic studies on the economically important traits of silkworm, Chr1 has long been regarded as the chromosome where the major effect loci are located. Many mapping studies indicated that Chr1 explained the largest percentage of the phenotypic variance [13][14][15][16][17][18] where the additive effect was dominant, implying that more genes on Chr1 contributed to CSW [13]. Therefore, it is important to decipher this region in detail to identify more genes controlling silk yield.
In this study, we conducted an overall examination of the 0-134 kb region at the terminal portion of Chr1 and performed expression and sequence analyses of genes in the region in order to reveal the signi cance of the terminal portion of Chr1 in silk yield.

Insects
Low-yield strain IS-Dazao, high-yield strain 872B, and 93 other low-yield and high-yield strains were all from the Silkworm Gene Bank of Southwest University, China. The larvae were fed with fresh mulberry leaves under a 12-h/12-h light/dark photoperiod at 25°C.

Expression analysis
The materials used in the temporal expression pro les were IS-Dazao and 872B posterior silk glands of 5th instar larvae. The glands were dissected in physiological saline at the same timepoint of each day of the 5th instar and frozen in liquid nitrogen. The posterior silk glands used for multi-strain association analysis were taken from 22 strains on day 3 of the 5th instar. Tissues were lysed using TRIzol (Invitrogen, Carlsbad, CA), and total RNA was extracted according to the manufacturer's instructions, followed by reverse transcription of the RNA using the PrimeScript RT reagent Kit with gDNA Eraser kit to obtain cDNA for each tissue for subsequent quantitative analysis.
We performed RT-qPCR using the Bio-rad CFX96 sequence detection system with an iTaqSYBR Green (Bio-rad) according to the manufacturer's instructions. The B. mori ribosomal protein L3 (rpL3, GenBank Accession Number AY769270.1) was used as the reference, and the primers used in this process are listed in the supporting Table 1. Each sample had three or more biological replicates and technical replicates, and the expression levels of the genes were analyzed using Student's t test.

Association analysis
Association analysis involved silkworm strains with different yield traits, and the CSWs were weighed. The indels were searched in the initial mapping region. Speci c primers were designed based on the anking sequences and sent to Huada Gene Synthesis. Primers were then ampli ed in the parental genomic samples to detect ampli cation speci city and polymorphisms. Polymorphic markers were developed for selective genotyping. This is a method of genetic mapping based on linkage disequilibrium. We classi ed the above developed indel markers in the 95 strains and conducted association analysis based on their phenotypic data. The genotype of the strain was classi ed, and the signi cance was judged by the degree of association between the band type and CSW. The statistical results were analyzed by one-way ANOVA (** = P < 0.01). The primer sequences are listed in supporting Table 1.

Sequence cloning
PCR products were recovered using the Glue Recovery Kit (OMEGA) according to manufacturer's instructions. After recovery, the product fragment was inserted into a PMD19-T vector (Takara) by solution I and ligated at 16°C for 12 h. The ligation product was transferred into Escherichia coli competent cells. Sequences were obtained by cloning and picking positive colonies for sequencing.

Selective pressure analysis
SNP calling was performed using resequencing data of seven wild silkworm strains, 42 local strains, and 87 improved strains for subsequent selection pressure analysis [19]. We calculated the population nucleotide diversity (π) of wild silkworms, local silkworms, and improved silkworms with a window size of 200 kb and a step size of 100 bp. The Fst values of the differentiation degree were calculated to determine the degree of selection for the gene.

Results
The terminal region of Chr1 was subject to domestication selection In order to study whether the 0-134 kb region of Chr1 (Fig. 1a) was subject to selection during domestication, we selected wild strains, local strains, and improved strains to analyze the nucleotide diversity of this region. The results showed that the nucleotide diversity (π) of wild silkworm from 23.2k of Chr1 was much higher than that of local and improved strains. The average π value of wild silkworms was 0.0090, while being only 0.0044 and 0.0041 in local and improved strains, respectively. The domestication process from wild strains to local strains reduced the π of wild silkworms by 51%, while the π value of the breeding process from local strains to improved strains was reduced by only 7% (Fig. 1b). This result suggests that the entire region was subjected to purifying selection, and the reduction of nucleotide diversity mainly occurred in the domestication stage. In order to further verify the above inferences, we used wild silkworm strains vs local strains and local strains vs improved strains as materials to simulate the domestication process and the breeding process and analyzed the degree of sequence divergence (Fst) in the candidate region. The results showed that consistent with the analysis of nucleotide diversity, the value of Fst in the range of 0-23.2k was not very different, and signi cant divergence began at the 23.2k point. The highest Fst of wild silkworm strains vs local strains was 0.793, with an average of 0.392; the highest Fst of local strains vs improved strains was 0.301, with an average of 0.063 (Fig. 1c). This result suggested that the loci were selected during the domestication process rather than during the breeding process. Since the reduction of nucleotide diversity may also be affected by factors such as genetic drift and bottleneck effects, we calculated Tajima's D in this region. The data showed that Tajima's D of local strains was always less than 0 in this region, suggesting that the decrease of nucleotide diversity in this region during domestication was indeed the result of purifying selection. Moreover, we also noticed that the value of Tajima's D of the wild population was always less than 0, implying that this region was also subject to purifying selection in wild silkworms. It is necessary to conduct an in-depth study on the speci c reasons and the signi cance of this result (Fig. 1d). In summary, the analysis shows that the region of Chr1 from 23.2 kb to 134 kb was subjected to selection during domestication, and that selection in the region has maintained its integrity.
The sequence variation in this region is associated to the CSW In sericulture, the CSW is the most important index of silk yield. Therefore, CSW was our rst consideration for breeding target traits. Whether this locus is associated with CSW needed to be further explored through association analysis between polymorphic loci and CSW. In order to nd the polymorphic loci, we sequenced the entire region in the low-yield strain IS-Dazao and the high-yield strain 872B and found 25 indels. We then genotyped these indels in 95 different silkworm strains with known CSW to detect the association between the variants and silk yield. Through association analysis, we found that 23 indels in the region showed a high level of association with CSW ( Fig. 2a). By comparing the position information, 14 of the 25 indel markers were located in the gene region of interest. Eleven indels were located in the intron region; three were located in the exon region, and 11 were located in the intergenic region, including two indels located in the upstream 5K regulatory region (Fig. 2b). Using the indel of this region as a marker for linkage disequilibrium analysis, we found that the degree of linkage disequilibrium in this region was very high, with an average D of 0.86 (Fig. 2c), implying that the region as a whole has been co-selected. In summary, the region screened by the association analysis was similar to the results of the selection pressure analysis. The sequence variation in the interval from 25.2 kb to 134 kb was related to CSW, and the region is in a state of linkage disequilibrium (Fig. 2a).
The expression patterns of genes in the region are signi cantly similar Selection pressure analysis, association analysis, and linkage disequilibrium analysis showed that the region may be co-selected for improving CSW. We therefore speculated that multiple genes in this region co-regulated CSW. To explore this, we rst searched for genes in the region and found that the functions of these genes were mostly related to cell division and growth. Multiple genes working together are often accompanied by similar expression pro les. We determined the expression levels of genes in the region. The silk gland is the main organ secreting silk protein. The middle silk gland secretes sericin, and the posterior silk gland secretes silk broin. Silk broin is the main raw material used in the silk industry; so, the posterior silk gland was used as the material in this research. Because the 5th instar of the silkworm is the critical period for silk gland development and silk protein synthesis, we chose the 5th instar posterior silk gland as the experimental material to analyze the expression pro les of genes. The qPCR analysis results for eight genes in the region found that these genes had similar expression pro les. Their expression levels gradually increased with the development of silk glands, and their expression levels reached a peak on day 6 of the 5th instar, which was the most rapid period of silk gland development. We then calculated the correlation coe cients between gene expression pro les and found that the expression pro les of each gene pair had extremely high correlations, and the average Pearson's correlation coe cient (r) was always higher than 0.5 (Fig. 3b). The frequency of the correlation coe cient (r) between genes in the 0.8-1.0 interval was as high as 69.4% (Fig. 3c). The above results suggest that genes work together to promote silk gland development.
We matched the physical locations of these genes and the values of Fst in the region where the genes were located and found that these genes had a high degree of population differentiation between wild and local strains (supporting Table 2). The average values of Fst of BMgn002068, BMgn002066, BMgn002067, BMgn002073, and BMgn002070 were all above 0.45 (supporting Table 2). We annotated the functions of these genes and found that many of them were related to energy metabolism and cell growth and division (supporting Table 2). According to the results of correlation analysis of the expression levels of multiple genes and the correlations of function annotation, there may be a regulatory mechanism involving multiple genes that co-regulates the CSW in the candidate region. There were 14 markers located in the gene region and three markers located in the exon region in the candidate region that were signi cantly associated with CSW (Fig. 2b). In order to explore whether an indel located in the exon region affected the function of the corresponding gene, we performed a sequence blast between high-yield and low-yield silkworm strains. The results showed that BMgn002066 and BMgn002073 had 1 bp and 11 bp deletions in the 6th and 2nd exon regions, respectively. These sequence variations led to premature termination of translation and breakage of the domain, which in turn caused signi cant changes in the protein structure (Fig. 4a, b). In order to explore whether the sequence variations were related to CSW, we divided 95 association analysis materials into two groups based on the sequence type, namely H haplotype and L haplotype, and performed an association analysis with CSW. There was a signi cant difference between H and L haplotypes CSW. The mutation in BMgn002066 reduced CSW by 0.069 g, which was about 27% of the H haplotye CSW. The mutation in BMgn002073 reduced CSW by 0.080 g, which was about 38% of the H haplotye CSW (Fig. 4a, b). The above results indicate that the two sequence variants that can cause changes in protein structure are signi cantly related to CSW and can explain yield trait variation to a large extent.
There is often a certain degree of association between a trait and the expression level of the gene that regulates the trait. Therefore, we performed multi-strain expression level analysis on the seven genes detected in the selection pressure analysis (excluding the published gene BMgn002070) to further screen for co-regulatory genes. In the expression pro le analysis, we found that the genes reached the peak of expression level the day before mounting the cocooning frame, i.e., day 6 of the 5th instar. Therefore, we selected 11 high-yield strains and 11 low-yield strains to determine the gene expression levels in the posterior silk glands on day 6 of the 5th instar. The results showed that the expression levels of genes BMgn002067 and BMgn002071 showed signi cant differences. This result indicates that these two genes may co-regulate silk gland development, thereby affecting the CSW.

Discussion
In this study, we found a region signi cantly associated with CSW through arti cial selection analysis and association analysis. Through sequence and expression pro le analyses, we found that genes within the region appeared to function together. The expression levels of BMgn002067 and BMgn002071 were signi cantly different between high-yield and low-yield strains, while the exon regions of BMgn002066 and BMgn002073 were mutated, leading to changes in the domain.
Among the genes with domain mutations, BMgn002066 encodes adenylate and guanylate cyclase, an important signal transduction enzyme. After activation, it can not only activate the NO-SGC-cGMP signaling pathway but also inhibit the TGF-β signaling pathway [20][21][22][23]. cGMP is an important second messenger in vivo and can regulate downstream effector molecules such as protein kinase G. cGMPdependent phosphodiesterase and cGMP-gated ion channels participate in a series of physiological or pathological reactions, including vasodilatation of blood vessels, inhibition of platelet aggregation, inhibition of cell proliferation, and other physiological regulation processes [21]. The inhibition of TGF-β signaling pathway can induce physiological effects on the inhibition of tissue brosis and cell proliferation. In the research on rectal and thyroid cancer, it has been found that adenylate and guanylate cyclase plays an important role in cancer proliferation [21]. As the silk gland undergoes cell division during the embryonic period, the number of cells is signi cantly related to the silk yield, and thus, we speculate that BMgn002066 affects the development of the silk gland by affecting cell proliferation. BMgn002073 encodes an SLC4-like anion exchanger. SLC4 gene products play an important role in CO(2) transport in red blood cells, in H(+) and HCO(3)(-) absorption or secretion by various epithelial cells, and in regulating cell volume and intracellular pH [24,25,26]. Studies have shown that AE2 of the SLC4 family can regulate cell volume increase by Cl(-) uptake [26]. In the silkworm, enlargement of silk gland cells occurs during the larval stage after the completion of silk gland cell division in the embryo. Combined with the function of SLC4 to regulate the cell volume, we speculate that this enzyme may affect the silk gland development by regulating the cell volume.
The genes BMgn002067 and BMgn002071 encode α tubulin-N-acetyltransferase 1 (ATAT1) and 5formyltetrahydrofolate cyclase (MTHFs), respectively, and there were signi cant differences in the expression level between high-and low-yield strains. ATAT1 is used to catalyze the acetylation of tubulin and is clustered in the nucleus during the G1-G2 phase. At late mitosis, ATAT1 co-locates with chromatids and spindles and eventually migrates to daughter nuclei, newly synthesized centrioles, and midbodies [27]. The speci c distribution of ATAT1 in the cell cycle suggests that ATAT1 has multiple functions, including microtubule acetylation, RNA transcription activity, microtubule cut-off, and cytokinesis completion [28][29][30][31][32][33]. In the study of the function of ATAT1 in cell division, researchers found that knocking out ATAT1 in Tetrahymena slowed its growth rate [34]. In the silkworm, silk gland development is closely related to cell division. Investigations have shown that the number of silk gland cells in high-yield strains is higher than that in low-yield strains. The number of silk gland cells does not change after the completion of silk gland cell division in the embryonic stage. ATAT1 functional research on the regulation of cell division indicates that this gene may regulate silk gland development by affecting embryonic mitosis. MTHFs are important enzymes in the pathway of folate metabolism, being involved in the metabolism of folate, purines, vitamins, and coenzyme factors. MTHFs regulate carbon ow through folate-dependent single-carbon metabolic networks [35,36]. This regulatory network provides carbon for the biosynthesis of purines, thymine, and amino acids and in uences DNA synthesis in vivo, thus in uencing cell division and maturation [37,38,39]. MTHFs have been demonstrated to inhibit cell growth in human MCF-7 breast cancer cells; it has been con rmed in mice that MTHFs are an essential part of the purinosome and provide conditions for cancer cells to rapidly synthesize purine nucleotides [40,41]. These studies of MTHFs show that they are closely related to cell division. In the silkworm, silk gland cell proliferation and growth directly affect silk gland development. Combined with the function of MTHFs in cell growth and division in other species, it can be inferred that genes that regulate silk yield are very likely to participate in energy metabolism and DNA synthesis. Therefore, it is possible that MTHFs can in uence the energy supply of silk gland development, provide carbon for DNA synthesis, and affect the division and maturation of silk gland cells, thus affecting the CSW.
In this study, multiple genes were shown to coordinate and regulate the same trait within a certain region. Generally, there are two types of gene clustering distributions. The rst type is a gene family in the form of a cluster in a region; this is the most common. The second type is where genes with similar functions are clustered together, although they are not necessarily in the same gene family [42]. Clustering distribution of genes has been studied in other species, for example, the conserved supergene loci affecting butter y diversity [43], the supergene loci affecting butter y mimicry [44], and a group of supergenes in uencing the shape of male water birds' neck hair [45]. Studies have shown that gene clustering is an effective gene regulation mode in biological evolution. Although some clustered genes belong to different gene families, they always show similar expression pro les. This may be due to the initiation of the regulation of gene expression or the structural changes of chromosomes. There is more than one gene in the same open box that can be activated and transcribed at the same time, thereby greatly improving the e ciency of transcription. There have also been studies on silkworms through mixed pool sequencing analysis; these have found that most of the genes related to silk gland development in the silkworm are clustered in the genome and have similar expression pro les [46]. The selected genes ATAT1, MTHFs, Adenylate and Guanylate cyclase, and SLC4-like anion exchanger appear in clusters along with the silk yield regulating gene BmAbl1 detected in previous studies, and the expression pro les are signi cantly similar. Based on this, we speculate that they may share the same enhancer to regulate the open reading frame, and the e ciency of multi-gene action can be greatly improved by regulating the expression of gene clusters at the same time. The simultaneous accumulation of gene effects can jointly regulate cell division, protein synthesis, and energy supply in order to regulate silk yield traits more e ciently. This nding implies that it is not only morphological traits but also a similar mode of regulation in quantitative traits that will provide a new reference for the study of quantitative traits.
The results of the domestication analysis in this study were very interesting. In the selection pressure analysis, we found that the location of the gene cluster received stronger selection during the domestication process than during the breeding process. This was indicated since from wild to local strains, CSW only increased from 0.06 g to about 0.15 g, while in the process of breeding, the CSW of improved strains reached up to 0.4 g [12]. The silk yield has been greatly improved during the breeding process, and the researchers therefore speculated that the breeding process may have entailed greater selection for silk yield-related genes. In addition, the results of a principal component analysis (PCA) of the silk gland transcriptome in a previous study showed that most of the differentially expressed genes had little difference in principal components between wild and local strains, while the difference between local and improved strains was greater [10]. The PCA results con rmed the above speculation. However, in this study, the loci of the gene cluster were more strongly selected during the domestication process. In response to the completely different results of this study, we found the same situation in a cotton yield study by consulting the literature. The genetic divergence degree of cotton from local strains to improved strains was only 0.04, while the Fst of wild and local strains was as high as 0.10. This suggests that the process of breeding has had little effect on the genetic diversity of cotton, while the process of domestication greatly reduced the genetic diversity of cotton [47]. From this result, we hypothesized that our result might be due to the fact that some important yield regulatory genes were selected and xed in advance during the domestication process to ensure the base yield.
This study has identi ed the rst gene cluster region mapped by the forward genetic method, providing evidence for the regulation of gene clusters in silk yield. However, the regulation mode of gene clusters in the region and the mechanism of how each gene regulates silk yield need to be studied further in future.

Conclusion
This study found that there is a cluster of genes at the terminal region of chromosome 1 that can jointly regulate the CSW of the silkworm. The genes within the cluster have similar expression pro les, and the region where the gene cluster is located is co-selected and is signi cantly related to CSW. Four genes related to CSW were identi ed through gene sequence variation and gene expression level analyses among multiple strains. These results are signi cant for our understanding of the genetic basis of silk yield, and the data offer a reference to direct the molecular breeding of silkworms.   Analysis of association and linkage disequilibrium of Chr1 0-134k a. The x-axis represents the physical location of Chr1; a dot represents an indel mark, and a black box represents the degree of linkage between the marks. The y-axis shows the degree of association with the CSW. The threshold value calculation method is p_value/number of markers = 0.05/25 = 0.002 and is marked with a black dotted line. b. The percentage of markers located in the region. Gray represents the markers located in the intergenic region, and orange represents the markers located in the gene region. The pie chart on the right is a detailed view of the markers of the gene region. The dark orange segment represents the proportion of markers located in the exon region among the markers in the gene region, and the light orange represents the proportion of markers located in the intron region. The pie chart on the left is a detailed view of the intergenic region in which the light gray part represents the proportion of markers located in the regulatory region to the number of markers in the intergenic region. c. Linkage disequilibrium heat map with 38 indel data as markers. The red box represents the tight linkage between the markers.

Figure 3
Expression pro les of eight genes in the candidate region. a Heat map of expression pro les of eight genes in the candidate region. Each row represents a gene, and the gene number is marked on the right side of the heat map. Each column represents a time point, from day 1 of the 5th instar to day 7 of the 5th instar, marked at the top of the heat map. The heat map colors represent the expression levels of the genes. Red represents a high expression level, and green represents a low expression level. The expression color scale is placed on the lower side of the heat map as a reference for the expression level.
b Correlation heat map of the expression level of eight genes. Both the abscissa and ordinate are eight genes in the region. x-axis: Gene ID BMgn002068, BMgn002065, BMgn002066, BMgn002067, BMgn002071, BMgn002072, BMgn002073, and BMgn002070, from left to right; y-axis: Gene ID from top to bottom: BMgn002068, BMgn002065, BMgn002066, BMgn002067, BMgn002071, BMgn002072, BMgn002073, and BMgn002070. Each small square represents the Pearson's correlation coe cient (r) value of the expression pro les of a pair of genes. The darker the color, the stronger the correlation. The r values were used to generate the color scale, and these are listed on the lower side of the heatmap as a numerical reference. c Frequency histogram of the Pearson's correlation coe cient (r) of gene expression levels. The x-axis represents the r value, and the y-axis on the left represents the frequency represented by a histogram. Figure 4