Global analysis of lysine acetylation reveals its role in multiple biological processes in Glycine max


 Protein lysine acetylation (Kac) is an important post-translational modification present in both animal and plant cells. Here, we reported the results from a proteomic investigation of Kac in soybean leaves. We totally identified 3148 acetylation sites in 1538 proteins from three biological replicates, among 59 lysine acetylation sites in core histones, represents the largest acetylome dataset in plants to date. Gene Ontology (GO) functional analysis illustrated that most of the acetylated proteins involved in metabolic processes (include carboxylic acid metabolic process, oxoacid metabolic process, nucleoside metabolic process, nucleoside phosphate metabolic process, and ribose phosphate metabolic process). KEGG pathway enrichment showed Kac plays an important role in Photosynthesis, Carbon fixation in photosynthetic organisms and Citrate cycle (TCA cycle). Meanwhile we also find a total of 17 conserved Kac motifs. All together, our study not only provides the first global and most extensive lysine acetylation analysis in soybean leaves, but also suggest that lysine acetylation is play an important and unique role in plants.


Introduction
Glycine max was cultivated in China for nearly 5000 years, commonly referred to as soybeans, now it has become one of the important economic crops in the world (Li et al., 2008). Post-translational modi cations are known to regulate many cellular processes, which are dynamic and reversible and can make protein functions changed. (Westermann and Weber, 2003). To date, among 400 PTMs have been detected, such as Acetylation, Ubiquitination, Phosphorylation, Malonylation, Succinylation and Methylation (Colak et al., 2013;Weinert et al., 2013).
Lysine acetylation (Kac) is one of the most important PTMs that involved in various biological processes in both eukaryotes and prokaryotes, such as including cellular metabolism, cell cycle and microtubule stability (Choudhary et al., 2009;Christensen et al., 2019;Henriksen et al., 2012). Kac were rst detected in histones (Allfrey and Mirsky, 1964), however, lysine acetylation is not limited to histones (Narita et al., 2019). Previously studies detected 388 Kac sites in 195 proteins on human HeLa cells and mouse liver cells (Kim et al., 2006) . Recently studies analysis using a method of coupling immune a nity enrichment for acetylated peptides and LC-MS/MS have discovered many other acetylated proteins (Guan et al., 2010;Luo et al., 2014). In Vibrio parahemolyticus identi ed 1413 lysine acetylation sites in 656 proteins (Pan et al., 2014). In Synechocystis lysine acetylome analysis identi ed 776 acetylation sites in 513 proteins (Mo et al., 2015). Another study in strawberry leaves was detected 1392 Kac sites in 684 acetylated proteins (Xu et al., 2017). Also, another study in Oryza sativa, proteomic analysis discovered 1337 Kac sites on 716 acetylated proteins (Bai et al., 2019).
In spite of many studies of lysine acetylation in bacteria, mammalian cells and plants, but the datasets throughput are not large enough to comprehensive exploration lysine acetylation functions. In this study, we took advantages of anti-acetyllysine-based enrichment and liquid chromatography-tandem mass spectrometry (LC-MS/MS) to acquire a lysine acetylome in Glycine max, overall, we identi ed 3148 acetylation sites in 1538 proteins, representing the largest dataset of acetylome in plant to date. Our study demonstrates the lysine acetylation is involved in multiple cellular metabolism and cellular processes especially in photosynthesis, fatty acid biosynthesis and carbon xation.
Then frozen in liquid nitrogen and stored at −80 • C.
Protein extraction and trypsin digestion.
Proteins were extracted from soybean leaves with previous report with some modi cation. Brie y, embryos were grinded in liquid nitrogen and homogenized in lysis buffer (8M urea, 1% Triton-100, 10 mM dithiothreitol (DTT), and 1% Protease Inhibitor Cocktail. The remaining debris were depleted through centrifugation at 15 000g for 15 min at 4 °C. Then the supernatant was precipitated with ice-cold acetone for more than 4 hours at −20°C and then centrifuged at 15 000g for 15 min at 4 °C. The obtained protein was washed with cold acetone three times and stored at −80 °C for further use. For digestion, the protein was dissolved in buffer (8 M urea, 100 mM NH4HCO3, pH 8.0). Then the protein solution was reduced with 5mM DTT for 30min at 56°C following alkylating with 11mM iodoacetamide for 15min at room temperature in darkness. After dilution with 100mM TEAB to reduce urea concentration to less than 2M, a two-step trypsin digestion was carried out according to the method of Zhang et al. . After digestion, peptide was desalted by Strata X C18 SPE column (Phenomenex) and vacuum-dried.

HPLC fractionation and a nity enrichment
After labelling, the peptides were fractionated into fractions by high pH reverse-phase HPLC using Thermo Betasil C18 column with mobile buffer A (98% H 2 O and 2% acetonitrile with 10 mM ammonium formate, For a nity enrichment, the fractions of peptide were incubated with pre-washed pan anti-acetyllysine antibody beads (Cell Signaling Technology, Danvers, USA) in NETN buffer (100mM NaCl, 1mM EDTA, 50mM Tris-HCl, 0.5% NP-40, pH 8.0) at 4°C overnight with gentle shaking. After washing four times with NETN buffer and twice with double distilled water, the lysine acetylation peptides bound to the agarose beads were eluted with 0.1% tri uoroacetic acid. (Gu et al., 2016) Finally, the eluted fractions were combined and vacuum-dried for further use.

LC-MS/MS analysis
The dried peptides were rstly dissolved in 0.1% formic acid (FA) and separated using a reversed-phase analytical column (15cm length, 75μm i.d.) on an EASY-nLC 1000 UPLC system. 7 Then, the peptides were subjected to NSI source followed by tandem mass spectrometry (MS/MS) in Q Exactive (Thermo Scienti c) coupled online to the UPLC system. Detection of intact peptides were performed in the Orbitrap at a resolution of 70,000 (m/z 200) with Normalized Collision Energy (NCE) setting of 28. To scan MS, the m/z range was set from 350 to 1800. A data-dependent procedure that alternated between one MS scan followed by 20 MS/MS scans was applied for the top 20 precursor ions above a threshold ion count of 3E4 in the MS survey scan with 15.0s dynamic exclusion. Automatic gain control (AGC) was used to prevent over lling of the Orbitrap; 2E5 ions were accumulated for generation of MS/MS spectra. The

MS/MS Database search
Acquired MS/MS raw data were using Maxquant search engine (version.1.5.2.8) search against the Glycine max database from Uniprot, which contains 74863 sequences https://www.uniprot.org/proteomes/UP000008827 and add with reverse decoy database. False discovery rate (FDR) thresholds for lysine-acetylated peptides and proteins were speci ed 1%. Trypsin/P was used as cleavage enzyme with up to 2 missing cleavages and set the minimum number of amino acids is 7. Kac site localization probability was set as > 0.75. All the MS data were deposited to ProteomeXchange Consortium via the PRIDE partner repository. The accession number is PXD021246.

Bioinformatics analysis
UniProt-GOA database (http://www.ebi.ac.uk/GOA/) was used to Gene Ontology (GO) annotation of proteome database. Firstly, mapping UniProt protein ID to GO ID, if identi ed lysine acetylation protein mapping failed, using InterProScan (http://www.ebi.ac.uk/interpro/search/sequence/) based on protein sequence alignment to get protein's GO function annotation (Dimmer et al., 2012). GO annotations can be used to classify proteins based on three categories: biological processes, cellular components, and molecular functions. Motif analysis for lysine acetylation using motif-x algorithm to analyze the model of sequences in all identi ed Kac site 10 amino acid upstream and downstream in all protein sequences (Schwartz and Gygi, 2005). We used Cytoscape software (v3.7) to analyze protein-protein interactions of identi ed proteins. The protein-protein interaction network was obtained from the STRING (v11.0) database and selected all interactions with a con dence score ≥ 0.7 (high con dence).

Systematic Pro ling of Lysine Acetylation in Soybean Leaves
In this study, to globally investigate the acetylome in soybean leaves, we coupled with immune a nity enrichment and high-resolution LC-MS/MS methods ( Figure 1A). To roughly detect lysine acetylome in soybean, we examined total proteins were isolated from soybean leaves by western blot using a pan acetyl-Lysine antibody (Micrometer Biotech Company, Hangzhou, China). In consequence, we detected many major protein bands which molecular weight higher than histones ( Figure 1B).
So as to verify the MS/MS data, we detected the distribution of mass errors and lengths of the identi ed peptides. As the results shown, the distribution of most identi ed peptides mass errors was close to 0, and less than 2 ppm. Furthermore, the lengths of identi ed peptides were in the range of 7 to 15 amino acids, coincide with the properties of trypsin. proving the precision of MS/MS data ( Figure 1C,D).Totally we identi ed 3148 acetylation sites in 1538 proteins from three biological replicates with a false discovery rate that were below 1%. Which includes 59 lysine acetylation sites in core histones, among 28

Functional classi cation of acetylated proteins in Soybean Leaves
To better elucidate the potential roles of lysine acetylome in soybean leaves. We detected the Gene Ontology (GO) functional classi cation of all identi ed acetylated proteins based on their biological process, molecular function and cellular component (Figure 2A). According to the biological process classi cation there are 798 acetylated proteins involved in metabolic processes (52%) and 619 acetylated proteins involved in cellular processes (40%). The classi cation results for molecular function showed that 703 acetylated proteins were related to catalytic activity (43%), followed by 667 acetylated proteins binding of various targets (41%), and 128 acetylated proteins involved in structural molecule activity (8%) (Supplementary. Table2). Which highly consistent with acetylated proteins identi ed in other organisms (Li et al., 2016;Xue et al., 2018).
Within the cluster of subcellular localization, a total of 42% of the acetylated proteins was predicted in chloroplast and 29% was in cytoplasm ( Figure 2B). The results showed that the largest proportion acetylated proteins were assigned to photosynthesis (Supplementary. Table3).

Functional enrichment analysis of acetylated proteins
To further elucidate the biological functions of acetylated proteins in soybean leaves, we performed functional enrichment of GO (biological process, molecular function and cellular component) and KEGG pathway analyses (Figure 3). In the biological process category, most of the acetylated proteins were shown to be involved in metabolic processes, includes carboxylic acid metabolic process (p value = 1.23e -34 ), oxoacid metabolic process (p value = 6.67e -33 ), nucleoside metabolic process (p value =9.78e -13 ), nucleoside phosphate metabolic process (p value =2.93e -12 ), ribose phosphate metabolic process (p value =3.03e -12 ). This result was also in agreement with the Functional classi cation analysis that most acetylated proteins were involved in cellular metabolism. Besides, we also found that a large number of acetylated proteins were signi cantly enriched in the glycosyl compound biosynthetic process (p value =1.19e -11 ) and amide biosynthetic process (p value =5.18e -12 ). The results of KEGG pathway enrichment analysis indicated that most of the acetylated proteins were related to Photosynthesis, Carbon xation in photosynthetic organisms and Citrate cycle (TCA cycle).
Taken together, wide distribution of lysine acetylated proteins invovled in diverse pathways suggests lysine acetylation plays an important regulatory role in almost every aspects of cell metabolism in soybean leaves.

Analysis of acetylated-lysine peptide motifs
To evaluate the possible sequence motifs surrounding the acetylated lysine sites, we analyzed 3148 of the identi ed acetylated-lysine peptides from the −10 to + 10 positions around the Kac by motif-x algorithm (Schwartz and Gygi, 2005). 2840 acetylated peptides matched to 17 conserved motifs. Especially KacK, KacR, Kac*K, KacS, KacH and KacN motifs occupied the highest proportion, the acetylated peptides with these motifs were 478, 428, 264, 261, 238 and 196 (Kac represents acetylated lysine and * represents a random other amino acid residue) (Figure 4). In line with our observations, most acetylation motifs identi ed in soybean were also found in Fragaria ananassa (Fang et al., 2015), Oryza Sativa (Li et al., 2008) and Camellia sinensis (Xu et al., 2017). It was con rmed that lysine acetylation is a highly conservative modi cation between different organisms. According to the heat map of the amino acid composition around the acetylation site, Alanine (A), Glycine (G) and Valine (V) have the highest frequency at positions -1 to -4, while the occurrence of Serine (S), arginine (R) and Lysine (K) was lowest. According to these results, we can draw conclusions that proteins with A, G and V but without S, R and K around lysine residues will be preferred targets of lysine acetyltransferases in soybean.

Secondary structure analysis of acetylated lysine
In order to evaluate the relationship between protein structure and lysine acetylation in soybean. We used NetSurfP (v1.1) to analyze protein secondary structures of all acetylated proteins. The results show that, nearly 30% acetylated lysine and 27.5% all lysine were located in α-helices, besides 6.0% acetylated lysine and 6.4% all lysine were located in β-strands, approximately 63% acetylated lysine and 66% all lysine were located in disordered coils. 38.7% acetylated lysine and 66% all lysine were exposed to the protein surface ( Figure 5). According to the results of distribution pattern between acetylated lysine and all lysine didn't have obvious difference, acetylated lysine likely does not affect protein secondary structure.
Protein-protein interaction network analysis.
Interactions between proteins within cells are very important, since many molecular processes involve large amounts of protein components (Subramaniam et al., 2013). To further evaluate the acetylated protein-protein interaction how to regulates multiple metabolic processes and cellular functions. To generate the Protein-protein interaction network, all of the identi ed acetylated proteins were queried against to the STRING database (version 11.0), CytoScape softwear (v3.7.0) used to visualize the interaction network. Totally 1503 acetylated proteins were mapped to the STRING database, interaction network form STRING was visualized in Cytoscape. A graph theoretical clustering algorithm, molecular complex detection (MCODE) was utilized to analyz densely connected regions. As shown in (Figure 6), the top ve clusters identi ed were the Ribosome (Cluster I), Oxidative phosphorylation (Cluster II), Proteasome (Cluster III), TCA cycle (Cluster IV) and Photosynthesis (Cluster V).

Conclusions
In this study, we presented the rst comprehensive analysis of the acetylome in soybean leaves by antibody a nity enrichment and high-resolution LC-MS/MS methods. We identi ed 3148 acetylation sites in 1538 proteins, which represented the largest dataset of lysine acetylation in plants to date and expanded the catalogue of lysine acetylation knowledge in plants. Functional characterization of GO analysis shown that lysine acetylation was participated in various biological processes and cellular components. KEGG enrichment analysis of lysine acetylated proteins were signi cantly enriched in photosynthesis, Carbon xation in photosynthetic organisms and Citrate cycle. All of which have very important connection with plant photosynthesis and metabolism. Our study provides global analysis of lysine acetylation, the dataset reveals that lysine acetylation involved in diverse biological processes, and can be serve as resources used to study other plants.

Con ict of interest
The authors have declared no conflicts of interest. All authors read and approved the nal manuscript.       Probabilities of acetylated lysine and all lysine secondary structures, (alpha helix, beta-strand, disordered coil and surface accessibility).