Bioinformatics and Expression Pattern Analysis of Soybean Fatty Acid Desaturase Family Gene

Background To clarify the homology and structural differences of seven genes in the soybean fatty acid desaturase 2 (FAD2) family and the relationship between the expression level of each gene and oleic acid content at different stages of grain development, the seven genes in this family were studied through informatics analysis and evaluation of expression patterns. Results Database analysis, phylogenetic tree analysis, transmembrane structure prediction, amino acid sequence alignment, intron exon structure analysis, and protein motif analysis were performed for seven genes in the FAD2 family. The results showed that GmFAD2-1 and GmFAD2-2b and Other genes are structurally different, and it is speculated that their functions also have weak changes. By analyzing gene expression patterns at different stages of soybean grain development, the expression levels of the family genes at different stages of grain development first increase, then decrease, and finally decrease. For stability, the correlation analysis showed that the gene expression was negatively correlated with oleic acid content, and the correlation coefficients of GmFAD2-1 and GmFAD2-2b were the largest, showing a significant negative correlation.

present, many FAD genes have been identified in crops such as peanut [2] , cotton [3] , and soybean [4] .In 2014, William Haun [5] et al. studied the soybean FAD2 family and increased the oleic acid content from 20-80%.At the same time, the content of linoleic acid, which is harmful to human health was reduced (from 50-4%), thus improving the quality of soybeans.In 2017, Wen [6] et al. raised the oleic acid content of flax seeds from 16% to more than 50% and decreased the linolenic acid and linoleic acid content from 35-9%.In 2017, Yan [7] et al. studied the transcription factor GmWRI1a, which is involved in fatty acid synthesis and glycolysis pathways.Compared with wild-type materials, the oleic acid content was decreased by 11.44%.
In 2012, Schlueter [8] et al. provided a detailed description of the soybean FAD2 gene, indicating that Δ12-fat dehydrogenase (FAD2) is a plastid ω-6 dehydrogenase in soybean.The protein, which is predicted to be primarily distributed on the plastid and endoplasmic reticulum membrane, is responsible for the formation of a hydrogen bond at a specific position in the fatty acid chain of the oleic acid, such as carbon no. 12 and no.13.Linoleic acid is a key enzyme that determines the amount of linoleic acid synthesis.△12 FAD is a membrane-bound protein primarily present in the plastid membrane and converts oleic acid C181 to linoleic acid C182.The genes in this family are mainly present on the endoplasmic reticulum of plants and on the cell membrane of plastids [9][10][11] and participate in the metabolic synthesis of most unsaturated fatty acids.However, there is no synthetic process to produce linoleic acid in animals, and thus, it must be ingested in the diet [12][13][14] .
Unsaturated fatty acids have two particularly important functions: maintaining the flow of membrane lipids and serving as precursors for synthesis of arachidonic acid [15][16][17][18] .The first report of FAD2 described the enzyme in Arabidopsis.Later, scientists cloned other genes in the family.These genes have high homology with the Arabidopsis gene and high amino acid sequence identity [19,20] .
The methods for determining soybean oleic acid content include grain analysis using near-infrared spectroscopy [21] , gas chromatography [22] , and nuclear magnetic resonance spectroscopy [23] .For grain analysis using near-infrared spectroscopy, soy seeds that are smooth, full, and insect-free are selected as seeds to be tested and placed in the infrared sensing area of the instrument, and the results of at least three replicate seeds are recorded in the computer.In 2008, Zhang [24] et al.
established a method based on neural networks for near-infrared spectroscopy of soybean oil content and demonstrated the feasibility of this method.In 2007, Chai [25] et al. performed similar research in Heilongjiang.With soybean as the basic material, a near-infrared grain analyser was used to measure the sample to obtain a more complete spectrum database, and the experimental data were analysed using the linear regression method and processed to determine the soybean oleic acid content.In 2017, Wang [26] et al. used a near-infrared grain analyser to establish a rapid soybean oleic acid content determination system.They selected 289 soybean seeds at Jilin Agricultural University, established a calibration model for soybeans in Jilin Province, and constructed a spectrum of five fatty acid components.The model analysed provides theoretical and data support for the determination of soybean oleic acid.

Bioinformatics Analysis
Gene domain analysis of the soybean FAD2 family.According to the amino acid sequence of the seven genes in the FAD2 family in soybean, the accession numbers are shown in Table1-1, and comparison statistics were performed in the database Ensmbl.plants.Sequences without complete open reading frames were excluded, and the domains of each gene were further analysed by CDD using an online website.ProtParam in the Expasy website was used to analyse the physicochemical properties of the protein structural components and the isoelectric points of various genes in the family.Transmembrane structure analysis of soybean FAD2 family genes The transmembrane structure of proteins in the soybean FAD2 gene family was analysed using the software TMHMM (http://www.cbs.dtu.dk/services/TMHMM/), as shown in Fig. 1.1, and the results showed that the gene family had 2-6 different transmembrane structures, and the distance between the regions was very small.The main body is primarily located on the endoplasmic reticulum, and both ends have 100 amino acids located outside the membrane, which was predicted by subcellular localization (Cell-PLoc http://www.csbio.sjtu.edu.cn/bioinf/Cell-PLoc/).As a result, all the genes were found to be localized at the endoplasmic reticulum.This conclusion is consistent.
Phylogenetic tree analysis of FAD2 genes among different species By using the software MEGA5, the amino acid sequence of each member of the FAD2 family in each species was searched, and the NJ method was used for comparison to study the evolutionary position of each gene in the family and the kinship structure between the genes.As shown in Fig. 1.2, we found that the homology of soybean FAD2 family genes is similar to the Arabidopsis FAD2 family gene [27] .GmFAD2-2 and GmFAD2-2a have the highest homology; GmFAD2-1 and GmFAD2-1b have the highest homology and are less homologous to other soybean FAD2 genes, and their functions are also weakly differentiated.From the phylogenetic tree, we did not see the emergence of specific populations, which indicates that the FAD2 genes have been relatively conserved throughout evolution.

Analysis of the response element in each gene promoter region
The online site Plant CARE was used for promoter region analysis of each gene in the gene family.As shown in Table 1-3, hormone response elements and illumination responses are present in the soybean FAD gene family.Components, such as meristematic response elements and cis-acting elements, involved in low temperature reactions, were similar to the FAD2 genes in other species when subjected to stress [28][29][30] .

Analysis of Structural Patterns in the Soybean FAD2 Gene Family
In the Fig. 1.3, the exon structure analysis: yellow represents the exon region; blue represents the untranslated region; and black lines represent introns, since some genes have introns in the structure that are too long.The inconvenience is shown in the figure, and thus, in this experiment, the length of the default intron is the same.The results showed that the genes in the FAD2 family known in soybean, except for GmFAD2-2a and GmFAD2-2b, were structurally similar, indicating that the genes in the family have high consistency to some extent.
To study the evolutionary relationship between genes in the family, the protein structure was constructed using the online software MEME.From Fig. 1.4, we can see that the FAD2 genes clustered together in the soybean FAD2 gene family, with the exception of the GmFAD2-2a and GmFAD2-2b genes, and have almost the same amino acid type and the same protein motif type, which is consistent with the above results.
Amino acid sequence alignment between species and comparison with the Arabidopsis gene amino acid sequence In the Fig. 1.5 showed that the protein domains are similar to those of the Arabidopsis gene and contain conserved histidine clusters (red box in the figure): HECGHH, HRRHH, and HVAHH.In GmFAD2-2b, the first histidine cluster and the third histidine cluster become HDCGHH and HVVHH, respectively, but the position of the histidine cluster is unchanged.These three histidine clusters are responsible for the primary FAD2 activity.The main binding sites of Fe 2+ are necessary for completion of the catalytic reaction.The distance between the first cluster and the second cluster is relatively small, and the distance between the second cluster and the third cluster is large.This result is consistent with the oil of other species, such as sunflower species [31] .

Gene expression pattern analysis Analysis of expression patterns in different organs under low temperature stress
Through low temperature stress treatment, quantitative analysis of different genes in different organs of soybean was performed using a quantitative fluorescence technique.The results showed that the family genes were mainly divided into two types: the first type comprised GmFAD2-2 and GmFAD2-1b, which showed seed-specific expression, and the remaining genes, which are constitutively expressed, were categorized as the other type.Under low temperature (4 °C) stress treatment, the expression levels of genes in the family were expressed to varying degrees, as shown in Fig. 1.6.
The above results indicate that the expression patterns of the seven genes in the family were different in different soybean tissues and during low temperature treatment.Among them, only GmFAD2-2 showed significant expression in the grain, and the expression level in other tissues was extremely high.The expression level of the gene was up-regulated with an increase in the low temperature treatment time, and the expression level was the highest at 48 hours of low temperature treatment.GmFAD2-1b also showed significant expression in the grain, but with a change in the low temperature treatment time, the expression level was down-regulated, and the expression level was the highest when the plant was treated at low temperature for 6 hours.

Analysis of expression patterns of various genes at different stages of grain development
In the process of soybean oleic acid synthesis, the expression level of FAD2 genes directly affects the level of oleic acid.As shown in Fig. 1.7, the expression levels in soybean seeds changed to different degrees at different stages of soybean development, but the overall trend was consistent.In the early stage of soybean grain development, the gene expression was higher.As the grain matured, the expression level gradually decreased and finally stabilized.

Determination of oleic acid content at different stages of grain development
It can be seen from Fig. 1.8 that in the soybean genotypes "Jike Soybean 20", "Ji Midou 3", "Jike Midou No. 1" and "Jike Fresh Bean 1" evaluated in this experiment, the trend in oleic acid content changes in soybean seeds was basically the same.The oleic acid content in "Ji Midou 3" was highest on the 35th day after flowering, and the period of the highest oleic acid content in "Jike Soybean 20", "Jike Midou No. 1", and "Jike Fresh Bean 1" was found on the 40th day after flowering, but there were differences in oleic acid content between the varieties because the synthesis of oleic acid is affected by differences between genotypes.
Analysis of the differences in FAD2 gene expression levels between different grain development stages (Table 1-4) showed that in the genotype "Jike Fresh Bean 1", the total expression of the genes was the largest, and the difference was significant, resulting in the low oleic acid content of the variety at the early stage.The experimental data were consistent.Among the other varieties, there were no significant differences in the content.

Correlation analysis between gene expression and oleic acid content
By analysing the correlation between the expression level of FAD2 gene expression in grain development and oleic acid (Table 1-5), we found that the gene expression level was negatively correlated with the oleic acid content in "Jike Soybean 20".Among the genes in "Jike Soybean 20", the negative correlation coefficient for GmFAD2-2a was the largest, followed by GmFAD2-2b.The negative correlation coefficient for GmFAD2-1 was the largest in "Ji Midou 3".The negative correlation coefficient for GmFAD2-1 was the largest in "Jike Midou No. 1", and that for GmFAD2-2b was the largest in "Jike Fresh Bean 1", indicating a significant negative correlation.showed that the levels of the GmFAD2-2b gene were significantly different from those of other genes in the varieties "Jike Soybean 20" and "Jike Fresh Bean 1".On the 25th, 30th, and 45th day of flowering, there were significant differences, and the expression level was the highest.The expression level of GmFAD2-1 in the varieties "Ji Midou 3" and "Jike Midou No. 1" was significantly different from those of other genes.

Discussion
Soybeans are an important oil crop, and the oil content is valued throughout the world.Although research on the catalytic function of key enzymes in the soybean fatty acid biosynthesis pathway is relatively clear, research on the nature of expressed genes, especially FAD2, has been inconsistent; the accumulated information is relatively fragmented and lacks an understanding of the genetic basis and genetic behaviour of key enzyme genes that affect specific differences in the catalytic efficiency of different genes at the same locus, which limits the oil composition in soybean grains through joint regulation of each key enzyme gene.The content of oleic acid in soybean seeds and the ratio of fatty acid components are important reference factors for determining the quality of soybeans.However, pod development varies in different soybean receptor varieties due to differences in the seven known FAD2 genes in soybeans.A systematic analysis of the expression patterns at different stages has not been reported.Since the genes in this family have certain structural differences, there are also differences in function and expression regulation patterns.Oleic acid synthesis and that of other unsaturated fatty acids is a multigene regulatory chain reaction process.A single gene can not only affect the synthesis of oleic acid but also affect the synthesis of other unsaturated fatty acids.
This study found that FAD2 is affected by low temperature and directly affects the fluidity of plant cell plasma membranes.Unsaturated fatty acids are important components of plant cell plasma membranes and play an important role in the normal growth of plants.High plant cell plasma membrane fluidity can help plants resist low temperature and other adverse conditions [32][33][34] .In 2012, Byfieid [35] found that in the process of soybean germination, the expression level of FAD2 was significantly higher at low temperature than at normal temperature, while at high temperature, the expression level was significantly decreased, and at the same time, the oleic acid content was increased.

Conclusions
In this study, the correlation between FAD2 expression and oleic acid content during the pod development showed that FAD2 plays an important role in the process of oleic acid synthesis, and each gene in the family functions differently.In 2015, Li [36] found that oleic acid was undetectable in the early stage of soybean pod development but could be detected 16 days after flowering, and in the later stage, the content increased greatly.Oleic acid synthesis is a process involving multiple genes and co-regulators, and thus, inhibiting the expression of multiple genes simultaneously to change the oleic acid content, expand the scope of its promotion, and select for higher oleic acid content can lead to novel and beneficial soybean germplasm.

FAD2 family gene bioinformatics analysis Annotation of FAD2 family genes
A conserved structure was identified based on the National Center for Biotechnology Information (NCBI) database, the soybean database online website soybase (https://soybase.org/)and the Pram (http://pfam.xfam.org/)database.Gene location on chromosomes and the conserved domains in seven genes in the soybean FAD family were identified using Hmmer 3.0 software.

FAD2 gene evolutionary tree analysis and sequence alignment
The online program Clustal X (version 3.0) was used to align the sequences of the soybean FAD2 family, and a phylogenetic tree was constructed using MEGA 5.0 software for all functionally defined FAD2 genes in soybean species and between species to classify genes.The neighbour-joining (NJ) method was used to analyse the results.

Analysis of FAD2 gene protein sequences
The online website MEME (http://meme.nbcr.net/meme)was used to predict the structure and classification of other protein sequences in addition to the conserved sequences in the gene family.
The protein sequences of all the genes in the family were entered in the MEME website, and the parameters were all set to the default parameters.

Analysis of gene expression patterns Experimental materials Plant Materials
The soybean genotypes "Jike Soybean 20", "Ji Midou 3", "Jike Fresh Bean 1", and "Jike Midou No. 1", among which the "Jike Fresh Bean 1" oleic acid content is relatively low, while the other genotypes have relatively high oleic acid content, were selected and provided by the Plant Biotechnology Center of Jilin Agricultural University.

Required reagents
RNA extraction kit, real-time PCR kit, DNA standard molecular weight marker, and DL2000 kits were from TaKaRa.Fluorescent quantitative primers were synthesized by Changchun Kumei Biotechnology Co(Table S1)., Ltd.The plant RNA extraction kits were purchased from Omega.Sodium chloride was purchased from Thermo Fisher Company, and the other major reagents were obtained domestically and were of pure analytical grade.

Instruments and equipment
A fluorescence quantitative polymerase chain reaction kit (purchased from the United States), cryogenic freezing centrifuge at 4 °C, cryogenic − 80 °C refrigerator, water bath heating pot, and indoor plant culture room were utilized.

Low Temperature Stress Treatment
Clean and full "Jike Soybean 20" soybean seeds were chosen and planted in small plastic buckets with a puncture in the bottom to allow for easy draining.The planted seeds were placed in an artificial climate chamber that simulated the growth environment under natural conditions and were allowed to grow for approximately 2 weeks after germination.When the first three leaves were present, a series of abiotic stress treatments were carried out on the seedlings.Abiotic stress treatment of plants were conducted using the following methods.
Low temperature stress: 20 days after germination, the seedlings were placed in a refrigerator at 4 °C for cold treatment.After treatment for 0 h, 1 h, 3 h, 6 h, 12 h, 24 h, and 48 h, root, stem, leaf and germinated seed parts were taken.The control plants treated as described above were watered every other day.In the treatment process, replicates were prepared for each treatment group to reduce the error generated during the experiment.
After the samples were collected, they were placed in a -80 °C refrigerator for storage.

Determination of oleic acid content and relative expression of FAD2 genes at different stages of grain development
The soybean test strains "Jike Soybean 20", "Ji Midou 3", "Jike Fresh Bean 1" and "Jike Midou No. 1" were sown in the experimental field of Jilin Agricultural University, planted at the end of April, harvested at the end of September, and observed for the flowering period; the first sampling was performed on the 25th day after flowering.Every 5 days, the young pods in development were taken, placed in liquid nitrogen immediately, and then stored in a refrigerator at -80 °C.A total of 5 young pods were taken, and 3 replicate samples were used.A DS2500 grain analyser was used to measure the change in oleic acid content at different periods of grain development.Primer5.0 software was used to design fluorescent quantitative primers (Table S1) to quantify the genes in the soybean FAD2 family.The expression levels in pods at different stages of development were determined.

Real-time PCR
In this experiment, cDNA of the internal reference gene β-actin and the target gene was amplified by a two-step method using a fluorescence quantitative analyser, and three replicate experiments were performed for each set of samples.Following the instructions in the Fluorescence Quantitation Kit, the qRT-PCR reaction mixture was prepared as follows (operating on an ice box).To reduce error caused by the pipetting gun during the loading and ensure accuracy, the reaction mixture was uniformly mixed with a pipette and then dispensed into a sterile PCR tube, and finally, reverse transcription was carried out.The cDNA sample was added, centrifuged and placed in the qRT-PCR thermocycler (Table 1-7).Determination of oleic acid content in soybean seeds with the near-infrared grain analysis In this experiment, an NIRS DS2500 grain analyser was used to select the clean grain to be tested, which was put into a measuring cup.The seed volume needed to reach the bottom of the cup and cover the infrared scanning area (If the seed quantity is small, the seed can be ground into powder before being placed in the measuring cup).The sample was placed into the sample tank of the near-infrared grain analyser, according to the previously established laboratory procedure for acquisition of spectra to determine soybean oleic acid content.Analysis of each sample was repeated 3 times, and the measurement results were determined by the software Operator and automatically saved in the computer.The error between the results measured by the near-infrared grain analyser and the gas chromatography method was previously verified to be between 1% and 1.5%, which is within the acceptable range, indicating that the results of the near-infrared detection are authentic and can be widely applied to fatty acids.

Data analysis
All data in this experiment were analyzed using SPSS2.0software.Each group of experiments was set up to three replicates to reduce experimental errors.ANOVA was used to analyze the correlation between the difference in    Analysis of protein sequences encoded by the genes Analysis of protein sequences encoded by the genes Figure 5 FAD2 amino acid sequence in Arabidopsis thaliana and soybean Figure 5 FAD2 amino acid sequence in Arabidopsis thaliana and soybean Relative expression of FAD2 genes in different plant parts under low temperature stress

Figures
Figures

Figure 1 Analysis
Figure 1Analysis of transmembrane structure of gene proteins.

Figure 1 Analysis
Figure 1Analysis of transmembrane structure of gene proteins.

Figure 2 Phylogenetic 2 Phylogenetic analysis of FAD2 genes among species Figure 3 Structural comparison of introns and exons in different genes Figure 3
Figure 2 Phylogenetic analysis of FAD2 genes among species

Figure 6 Relative
Figure 6 Relative expression of FAD2 genes in different plant parts under low temperature stress

Figure 7 Variation 7 8 8
Figure 7 Variation in gene expression in different cultivars at different post-anthesis stages

Table 1 - 1 1
Function and identification number of soybean fatty acid desaturase genes

Table 1 - 4
Significance analysis of FAD2 gene expression at different grain development stages

Table 1
Analysis of the different expression levels of different genes of the same genotype on the same day (Table1-6)

Table 1 - 6
Significance analysis of the difference in the expression amount of different genes in the same variety

Table 1
The amplification conditions were predenaturation at 95 °C for 10 min, denaturation at 95 °C for 10 s, annealing at 55 °C for 20 s, and extension at 72 °C for 15 s; set to 55 cycles, and the relative expression of the target gene was calculated using the 2 −ΔΔCt method.