Enabling global analysis of protein citrullination and homocitrullination via biotin thiol tag-assisted mass spectrometry

Citrullination and homocitrullination are key post-translational modications (PTMs) that affect protein structures and functions. Although they have been linked to various biological processes and disease pathogenesis, the underlying mechanism remains poorly understood due to a lack of effective tools to enrich, detect, and localize these PTMs. Herein, we report the design and development of a biotin thiol tag that enables derivatization, enrichment, and condent identication of these two PTMs simultaneously via mass spectrometry. We perform global mapping of the citrullination and homocitrullination proteomes of mouse tissues. In total, we identify 1,198 citrullination sites and 108 homocitrullination sites from 619 and 79 proteins, respectively, representing the largest datasets to date. We discover novel distribution and functions of these two PTMs. We also perform multiplexing quantitative analysis via isotopic labeling techniques. This study depicts a landscape of protein citrullination and homocitrullination and lays the foundation to further decipher their physiological and pathological roles.


Introduction
Protein citrullination/deimination is an emerging post-translational modi cation (PTM) resulting from the conversion of peptidyl arginine to citrulline and is catalyzed by a calcium-regulated family of enzymes called protein arginine deiminases (PADs) (Fig. 1a) 1,2 . Protein homocitrullination/carbamylation is another chemically related PTM that occurs on lysine side chains. However, it is known as a nonenzymatic PTM and its expression is highly associated with the level of cyanate in vivo (Fig. 1b) 3 . These two types of PTMs lead to the loss of positive charges on the basic amino acid residues under physiological conditions, and therefore have a profound effect on protein conformations, protein-protein interactions and protein functions 1,2 .
The pathological involvement of these two PTMs was initially explored in rheumatoid arthritis in which pain in the joints is caused by PAD dysregulation. Proteins with aberrant citrullination and homocitrullination also stimulate the generation of anti-citrullinated protein antibodies that are related to atypical autoimmune and in ammatory responses [3][4][5][6][7][8][9] . In another extensively studied disease multiple sclerosis, excessive citrullination of myelin basic protein (MBP) is considered to be a major driver of partial unfolding of myelin sheath and the resultant impaired neuronal signal transduction 10-12 . Moreover, recent accumulating evidence has revealed that citrullination and homocitrullination are associated with the development of diverse pathological states including prion disease 13 , psoriasis 14 , Alzheimer's disease (AD) [15][16][17] and cancers [18][19][20][21] , which raises a fast-growing interest in studying these two important PTMs.
Despite the emerging interests, knowledge of the citrullination and homocitrullination proteome is still limited primarily due to the lack of effective analytical tools. Antibody-based techniques such as Western blotting and immunohistochemistry are currently the most prevalent methods to detect these PTMs 22-24 . However, these approaches are neither suitable for high-throughput analysis nor able to pinpoint exact sites of the PTMs with con dence 25,26 . Mass spectrometry (MS)-based strategies, on the other hand, are gaining popularity as powerful tools for large-scale characterization and localization of various PTMs.
However, its application to mapping the citrullination and homocitrullination proteome suffers from several challenges 26,27 . Firstly, signals of these low-abundance PTMs can be largely suppressed by other molecules in the sample and effective enrichment methods are lacking. Secondly, the small mass shift induced by citrullination (+ 0.984 Da) is easily confused with deamidation (+ 0.984 Da) and 13 C isotopic peaks (+ 1.0033 Da). These limitations contribute to the poor-quality tandem MS spectra, which pose challenges for con dent identi cation and localization of these PTMs. To combat these issues, signi cant effort has been devoted to improving aspects of the analytical work ow. However, none of the reported methods have overcome all the di culties so far. For example, direct MS analysis is possible but often requires high mass accuracy of the instrument and time-consuming manual examination of the spectra 28,29 . Delicate searching algorithms and statistical modeling have also been developed to aid in the direct analysis 30,31 . Chemical derivatization of the PTMs prior to analysis is an alternative to enlarge the mass shift but usually suffers from incomplete reaction 32,33 . The above-mentioned strategies did not address the intrinsic low abundance of these PTMs either. Alternative studies have sought the means of using chemical probes for simultaneous introduction of mass shift and enrichment groups. Nevertheless, previous designs led to unsatisfying fragmentation of the peptide backbones and thus limited the identi ed citrullination and homocitrullination sites 6,34,35 .
Here, we design a novel biotin thiol tag that enables derivatization and enrichment of citrullinated and homocitrullinated peptides with high speci city and e ciency. We then develop a reliable and robust proteomics approach for large-scale characterization of these PTMs from complex samples. The utility of this pipeline is demonstrated by comprehensive pro ling of the landscape of protein citrullination and homocitrullination from different mouse tissues. Furthermore, we combine this novel method with MSbased quantitation strategies, such as isotopic dimethyl labeling, to achieve multiplexed quantitative analysis of citrullination and homocitrullination from various biological samples.

Results
Development of a novel biotin thiol tag for citrullination analysis. Protein citrullination and homocitrullination both feature a ureido group on the side chains that can be used for chemical derivatization as previously reported 6,22,34,35 . Here, we design a biotin thiol tag that can be easily synthesized with low cost (Supplementary Fig. 1) and can speci cally react with citrulline or homocitrulline residues together with 2,3-butanedione (Fig. 1c). This derivatization not only increases the mass shift to allow more con dent identi cation, but also introduces a biotin moiety that enables subsequent enrichment of the modi ed molecules.
We rst performed a proof-of-principle test using a synthetic peptide standard containing one citrullination site within the sequence (SAVRACitSSVPGVR) (Supplementary Fig. 2a). After 6 h, the reaction was complete without any observable side products ( Supplementary Fig. 2b), suggesting a high speci city towards ureido group. The low-abundance peak at m/z 1392 corresponds to the loss of biotin moiety caused by in-source fragmentation when using a matrix-assisted laser desorption/ionization (MALDI) source. We then evaluated the enrichment performance by spiking the derivatized peptide standard into a complex peptide mixture (1:400, w/w) ( Supplementary Fig. 2c) followed by enrichment with streptavidin beads (Supplementary Fig. 2d). The results indicate that derivatized citrullinated peptides can be enriched with excellent speci city and released from streptavidin beads for MS analysis.
The peak at m/z 1392 is still present after enrichment which further proves that it originates from insource fragmentation instead of incomplete derivatization.
Previously reported chemical probes for citrullination analysis had bulky structures that negatively impacted the solubility of analytes. Upon derivatization, extensive yet uninformative fragments were generated from the tag, which severely impeded the peptide backbone fragmentation and therefore led to low identi cation rates 6 . In contrast, our novel design of biotin thiol tag features a compact structure which only generates two fragment/diagnostic ions during higher-energy collisional dissociation (HCD) ( Fig. 1d and Supplementary Fig. 3a-c). Consequently, peptide backbones can preserve good fragmentation e ciency and produce rich b/y or c/z ion series during HCD or electron-transfer dissociation (ETD) ( Supplementary Fig. 3a, d-g), respectively. The collected tandem MS spectra of the derivatized peptide standard delivered nearly full sequence coverage under HCD (Fig. 1e), ETD (Supplementary Fig. 3h) or electron-transfer/higher-energy collision dissociation (EThcD) ( Supplementary  Fig. 3i) fragmentation. Our results indicate that the biotin thiol tag derivatized citrullinated peptides can generate high-quality tandem MS spectra for sequence annotation, which enhances the identi cation con dence of citrullination sites when coupled with various fragmentation techniques. Improved in vitro protein citrullination analysis with biotin thiol tag. Following the initial experiments, we streamlined the citrullination and homocitrullination analysis using our biotin thiol tag and MS-based bottom-up proteomics approach (Fig. 2a). Proteins were extracted from biological samples and enzymatically digested to peptides. The biotin tag was incubated with the peptides under acidic conditions and reacted with citrulline or homocitrulline residues. Excess tag was removed by strong cation exchange (SCX), and derivatized citrullinated and homocitrullinated peptides were enriched by streptavidin resin. The enriched peptides were then released for liquid chromatography coupled with tandem MS (LC-MS/MS) analysis and data processing.
We tested this procedure using recombinant human histone H3 protein with or without in vitro PAD treatment. Although the recombinant protein, which is expressed in Escherichia coli, is supposed to bear no citrullination, we identi ed three citrullination sites in our experiment (Fig. 2b). This is likely due to the presence of unknown PAD isoenzymes in prokaryotes, which has been reported in recent literature 36,37 . Despite this unexpected result, the other arginine residues are proven to be non-citrullinated which still makes this protein a good negative control. After in vitro PAD treatment, we found all the arginine residues were catalyzed to citrulline with abundant peptides con dently identi ed as citrullinated (Fig. 2c), indicating the high e cacy of our method. Interestingly, some peptides were identi ed with citrullination sites located at peptide C-termini and two representative tandem MS spectra of high quality are shown (Fig. 2d, e). It remains controversial whether trypsin is able to cleave after citrulline residues. While some researchers believe citrulline is resistant to trypsin digestion due to its neutral-charge property and even use it as a rule to exclude their identi cations 28,38 , others have reported some C-terminal citrullination sites though manual inspection of the spectra is usually required 18,31 . Our results suggest that citrulline residues could potentially be cleaved by trypsin.
Exploring different fragmentation techniques and enzymatic digestion methods for optimized citrullination analysis from complex biological samples. We moved forward to evaluate our method with complex biological samples. We rst compared three MS fragmentation methods, including stepped HCD, HCD product ion-triggered ETD (HCD-pd-ETD) and HCD product ion-triggered EThcD (HCD-pd-EThcD), using mouse brain digest. All three methods were able to achieve in-depth citrullination analysis with decent numbers of identi cations ( Supplementary Fig. 4a) while stepped HCD method slightly outperformed the other two likely due to shorter cycle time. Different methods show certain overlaps but are also complementary to one another ( Supplementary Fig. 4b, c), suggesting the importance of choosing an appropriate one depending on speci c applications. When comparing the same citrullination site identi ed with various fragmentation techniques, we observed that they all produced high-quality spectra though EThcD showed even better sequence coverage as expected (Supplementary Fig. 4d-f).
Thus, we conclude that stepped HCD confers optimal performance for citrullination analysis of complex samples due to its faster acquisition rate and shorter duty cycle while EThcD shines in providing more informative fragment ions and hence is more bene cial for relatively simple systems.
We then sought to optimize the enzymatic digestion methods. Lower identi cation numbers were observed when only using LysC to digest the samples ( Supplementary Fig. 5a) probably because LysC digestion produces longer peptides, which results in lower fragmentation e ciency. Similar to the observation from in vitro protein analysis, we noticed that 51% of the citrullination sites were identi ed at peptide C-termini in LysC/trypsin digested samples and this percentage rose to 64% when using trypsin only ( Supplementary Fig. 5b, c). These ndings were consistent to the fact that trypsin digested samples provided slightly more identi cations compared to LysC/trypsin digestion ( Supplementary Fig. 5a) since higher missed cleavage rate of citrulline residues resulted in longer peptides in the latter. When searching the results of LysC digestion with tryptic peptide parameters, we found almost all of the citrullination sites were still identi ed in the middle of the peptide sequence ( Supplementary Fig. 5d), which demonstrates that no arti cial cleavage of citrulline residues happens after enzymatic digestion. Some citrullination sites were con dently identi ed with different digestion protocols (Supplementary Fig. 5e-g) that further supports our observations of trypsin cleavable C-terminal citrullinated arginine. Again, our results provide stronger evidence to support that some citrulline residues could be cleaved by trypsin though the mechanism needs further investigations.
Taking consideration of citrullination identi cation rate and economic cost, we determined that using LysC/trypsin digestion and stepped HCD fragmentation technique would be the optimal solution for processing the citrullination proteome. In addition, we evaluated the reproducibility by analyzing three biological replicates and the good overlap among these replicates indicates the robustness of our optimized methods ( Supplementary Fig. 6). All the detailed data described in this section are provided in Supplementary Data 1.
Large-scale citrullinome pro ling of different mouse tissues. Next, we ask whether the developed method can delineate the citrullination landscape from biological samples, and holds potential to elucidate the regulatory mechanisms of citrullination in cells. We performed an in-depth citrullinome analysis of six body organs and ve brain regions in mice, generating a rst tissue-speci c atlas of mouse citrullinome. In total, we identi ed 1,198 citrullination sites from 619 citrullinated proteins with high con dence ( Fig. 3a and Supplementary Data 2), which is a dramatic increase compared to previous studies. More importantly, about 60% of these proteins were not reported to be proteins with PTMs retrievable from the UniProt database ( Supplementary Fig. 7), which suggests that our results greatly expand the understanding of citrullination and how these substrate proteins are subjected to modulation via PTM. Intriguingly, we found each examined brain region doubles in the number of identi cations compared to other organs (Fig. 3a); however, the total number of citrullinated proteins in the brain is lower than that in the body ( Supplementary Fig. 7). To investigate the seemingly contradictory results, we generated two arcplots where the width of ribbons connecting two tissues is proportional to the number of overlapping proteins or sites between them ( Supplementary Fig. 8a, b). We observed a larger degree of overlap between brain regions with many more shared proteins and sites in between (Supplementary Fig. 8c-f). This could indicate protein citrullination functions importantly and similarly across multiple brain regions, while in body organs it is involved in diverse biological processes. Our results greatly expand the knowledge of the substrate proteome for citrullination although the overlapped fraction with UniProt repository is negligible (Fig. 3b). This is likely because nearly 40% of the citrullination sites described in UniProt are based on similarity extrapolation without experimental evidence which are inconsistent with the identi ed in vivo citrullination proteome. In addition, many of those reported sites are located on histone proteins especially at protein termini that may escape detection with our bottom-up strategies (Fig. 3b). Figure 3c captures the prevalence of singly-and multiply-citrullinated proteins where 60% of the identi ed proteins were observed with only one citrullination site.
The newly discovered citrullination proteome serves as a precious reservoir to conjecture the regulatory mechanisms of citrullination. For instance, we identi ed ten citrullination sites on MBP while there are only four reported in UniProt database (Fig. 3d). Our results provided high-quality tandem MS spectra, which not only con rmed the presence of known modi cation sites (Fig. 3e), but also identi ed unknown sites with con dence (Fig. 3f). These ndings may partially explain why MBP is more susceptible to hypercitrullination when PADs are dysregulated under pathological conditions and thus can help better understand the mechanisms in related diseases. Two citrullination sites described in UniProt were not detected in our study which could result from the complementarity of various analytical tools. But again, these sites from UniProt are all based on similarity extrapolation from human and our results might indeed indicate a species-speci c pro le of protein citrullination. Another interesting example is glial brillary acidic protein (GFAP), which is an astrocyte-speci c protein marker and is involved in astrocyteneuron interactions. Increased expression of citrullinated GFAP was also observed in brains from patients with AD 15,17 . In this study, we identi ed 14 citrullination sites on GFAP compared to four described in UniProt ( Supplementary Fig. 9), which reveals the importance of citrullination in regulating GFAP functions and understanding the pathology of AD and possible other astrocyte disorders. In addition, we detected many novel citrullinated proteins for the rst time. For example, we identi ed one citrullination site on apolipoprotein E ( Supplementary Fig. 10a) and microtubule-associated protein tau ( Supplementary Fig. 10b). These two proteins have been proven to be closely associated with the initiation and progression of AD 39-41 and our results suggest the possible roles of their citrullinated forms in the pathogenesis of such neurodegenerative diseases. We also identi ed two novel citrullination sites on NAD-dependent protein deacetylase sirtuin-2, which functions as an essential enzyme targeting histones, tubulin and many key transcription factors, and therefore plays a critical role in many biological processes ( Supplementary Fig. 10c) 42 .
We then performed a motif analysis and found there was no conserved amino acid sequence patterns anking identi ed citrullination sites (Fig. 3g), which is consistent with the observation that PAD treatment universally citrullinates arginine residues on histone H3 in vitro (Fig. 2c). To better discern the general functions that citrullinated proteins are involved in, we generated heatmaps showing multi-organ gene ontology (GO) analyses. Twenty most signi cantly enriched cellular components (Fig. 3h) or biological processes (Fig. 3i) are shown where the color coding indicates the p values of a certain term in different tissues. We found that there are clear disparities between brain and body while citrullinated proteins are more involved in brain functions. Speci cally, citrullinated proteins are concentrated in axon, myelin sheath, dendrite and synapse, and consequently function importantly in the central nervous system. Furthermore, they also participate in many critical metabolic processes including respiration and are observed to enrich in mitochondria. In accordance with this, we also identi ed eight citrullination sites on an essential glycolytic enzyme pyruvate kinase (PKM). Interestingly, many of these sites on PKM are located in the proximity of its substrate binding pockets ( Supplementary Fig. 11a), which raises the likelihood that citrullination can in uence the kinase activity and supports a recent study concluding that citrullination regulates glycolysis 43 . For instance, R120 and R294 are located near the catalytic pocket ( Supplementary Fig. 11b) while R455 and R461 are close to the allosteric center ( Supplementary Fig. 11c,  d). R399 was also shown to be very important in stabilizing the highly active tetrameric form ( Supplementary Fig. 11e, f) 44 . Our results greatly expand current understandings of protein citrullination by demonstrating its widespread distribution ( Supplementary Fig. 12) and involvement in many other biological processes ( Supplementary Fig. 13), molecular functions ( Supplementary Fig. 14) and Kyoto encyclopedia of genes and genomes (KEGG) pathways ( Supplementary Fig. 15).
Additionally, we noticed that 30 citrullination sites are colocalized with other arginine modi cations especially omega-n-methylarginine ( Supplementary Fig. 16a). For example, we identi ed ve citrullination sites on heterogeneous nuclear ribonucleoproteins A2/B1 (Hnrnpa2b1) and four of them were also reported as arginine methylation sites (Supplementary Fig. 16b). Hnrnpa2b1 was shown to in uence RNA metabolism and transport, and arginine methylation could regulate the nucleocytoplasmic distribution of this protein 45 . Our results raise the possibility that citrullination indirectly participates in biological processes through an interplay with other protein modi cations such as arginine methylation.
Pro ling of protein homocitrullination in different mouse tissues. Homocitrullination is highly similar to citrullination structurally though it occurs on lysine residues. Therefore, current methods using antibodies to detect protein homocitrullination suffer from poor speci city while MS-based approaches also result in unsatisfying identi cation rate due to its low abundance 25 . In contrast, our biotin thiol tag takes advantage of its high speci city towards ureido groups on both citrulline and homocitrulline, which allows for simultaneous enrichment and characterization of these two PTMs. We identi ed 108 homocitrullination sites from 79 proteins across all the tissues (Supplementary Data 3), which lls in the blank of protein homocitrullination database. Similarly, more sites and proteins were identi ed in brain regions compared to body organs that suggests its intimate association with brain functions (Fig. 4a). We also observed relatively high identi cation numbers in heart, which may indicate that this PTM is associated with processes such as transporting oxygenated blood and hormones to the body (Fig. 4a). Many other PTMs are described in UniProt on these homocitrullination sites detected in our study, indicating again a potential PTM crosstalk. For instance, we identi ed two homocitrullination sites with high con dence on histone H4 (Fig. 4b, c) while both are colocalized with several lysine modi cations (Fig. 4b). These modi cations were shown to play critical roles, which could modulate the packaging of chromatin by either directly altering chemical structures of histones or recruiting PTM-speci c binding proteins [46][47][48][49][50][51] . Our ndings of competing homocitrullination sites on histones provide new insights into the complex regulatory mechanisms in dynamic chromatin-templated processes. No obvious sequence patterns surrounding homocitrullination sites were observed either, though there is higher propensity that the identi ed sites are located near protein C-termini (Fig. 4d). As expected, homocitrullinated proteins are concentrated in myelin sheath and may function importantly in the nervous system (Fig. 4e). They also participate in functions related to oxygen binding (Fig. 4e), which is consistent to more homocitrullinated proteins being identi ed in heart (Fig. 4a). Interestingly, we found that they are more likely to locate in mitochondria and correspondingly associated with processes such as tricarboxylic acid cycle (Fig. 4e). Homocitrullinated proteins may interfere with cell-cell interactions as well which can be discerned from their prevalence in extracellular matrix components (Fig. 4e).
Multiplexed quantitative citrullination analysis using chemical labeling strategies. We then sought to achieve multiplexed quantitative analysis by combining our methods with chemical labeling strategies. In theory, samples can be differentially labeled and combined before being derivatized and enriched using our biotin thiol tag (Fig. 5a). For isotopic labeling such as reductive dimethylation, quanti cation can be achieved during survey scans 52 . While for isobaric labeling approaches such as tandem mass tag (TMT) 53 or N,N-dimethyl leucine (DiLeu) 54 , quantitative information can be obtained from reporter ions upon fragmentation (Fig. 5a).
In this study, we explored the quantitation capability of duplex dimethyl labeling which introduces a 4 Da mass difference between heavy isotopic labeling and light labeling (Supplementary Fig. 17). We rst tested this pipeline with citrullinated peptide standard and found the standard could be completely dimethylated without showing any observable side reactions ( Supplementary Fig. 18a, b), which ensures no interference to the following steps. When differentially labeling the standard and mixing with known ratios, accurate quantitation was achieved ( Supplementary Fig. 18c-e) and reliable results were obtained after biotin thiol tag derivatization (Fig. 5b). We moved forward to evaluate this strategy using complex biological samples (Supplementary Fig. 19a). Although we observed lower identi cation and quanti cation numbers which was likely due to increased complexity of the spectra ( Supplementary  Fig. 19b), the quanti ed citrullinated peptides exhibited both great accuracy and precision compared to theoretical ratios (Fig. 5c). In addition, the identi cation and quanti cation rates can be easily improved by increasing the amount of starting material or utilizing longer separation gradient. It is also worthwhile to note that dimethyl labeling conditions need to be carefully controlled to quantify homocitrullinated peptides since dimethylated lysine may affect further biotin thiol tag reaction.

Discussion
Herein, we report the design and development of a biotin thiol tag that speci cally reacts with citrulline and homocitrulline and allows for enrichment of target molecules. After demonstrating its e cacy using standard peptide and recombinant protein, we streamline the work ow to detect these two PTMs from complex biological samples. We then apply this protocol to pro le protein citrullination and homocitrullination of ve brain regions and six body organs in mice. In total, we identify 1,198 citrullination sites and 108 homocitrullination sites from 619 and 79 proteins, respectively, which is the largest dataset to date. Our study reveals the critical roles these two PTMs may play in the nervous system and indicate they also function importantly in many metabolic processes including respiration and glycolysis. Despite a few intrinsic drawbacks with the mass difference isotopic labeling techniques, we demonstrate that reductive dimethylation can be utilized in conjunction with our method to achieve simultaneous high-throughput quantitative analysis. We will also integrate isobaric labeling strategies to alleviate these shortcomings and further increase the multiplexing capability for quantitative PTM analyses in the future. Collectively, our results expand current understanding of protein citrullination and homocitrullination by mapping their widespread distribution in different tissues and participation in various biological processes than hitherto anticipated. More importantly, we envision our method can serve as a simple yet powerful tool for unambiguous identi cation and quanti cation of these modi cations, which will also inspire and bene t future investigations into their functional roles under physiological and pathological conditions.

Methods
Synthesis of biotin thiol tag. N,N-diisopropylethylamine (0.88 mM) was added to a solution of biotin-NHS ester (0.29 mM) and cysteamine (0.44 mM) in CH 2 Cl 2 (5 mL) and stirred at 40°C for 24 h. The crude product was puri ed using a CombiFlash system with a gradient of dichloromethane from 0 to 20% in methanol. Fractions containing pure product (as detected by UV) were collected (68% yield). 1 H NMR data was obtained from a Varian Inova 500 MHz NMR spectrometer. 13 C NMR data was obtained from a Bruker Avance III HD 400 MHz NMR spectrometer. The spectra were recorded in 10 mg cm − 3 CD 3 OD solutions with a probe temperature of 300 K and referenced to internal standard tetramethylsilane. 1  (Millipore) was added into the biotin thiol tag solution to a nal concentration of 10 mM before drying to prevent oxidation and the tag was stored at -80°C for long-term storage.
Derivatization of citrullinated peptide standard using biotin thiol tag. Citrullinated peptide standard SAVRACitSSVPGVR (Genscript) was dissolved in water to a concentration of 1 mg/mL. A solution of 2,3butanedione was prepared by mixing 1 µL of 2,3-butanedione with 114 µL 12.5% tri uoroacetic acid (TFA). Three hundred microgram of biotin thiol tag was dissolved with 40 µL 12.5% TFA. One microliter of citrullinated peptide standard and 10 µL 2,3-butanedione solution were subsequently added to initiate the derivatization reaction. The mixture was vortexed in dark at 37°C for 6 h and then dried in vacuo. To remove the excess tag, SCX was performed using TopTips (Poly LC) containing PolySULFOETHYL A beads following the manufacturer's protocol. Brie y, SCX tips were equilibrated with 100 µL loading buffer containing 50% acetonitrile (ACN), 0.2% formic acid (FA) and 10 mM ammonium formate for three times. The derivatized citrullinated peptide standard was then resuspended in 200 µL loading buffer and added to SCX tips followed by washing with 100 µL loading buffer for 10 times. Peptide was nally eluted with 50 µL 25% ACN and 0.4 M ammonium formate for 3 times. Flowthrough was collected and dried in vacuo. All centrifugation steps were performed at 400 g for 2 min.
Enrichment of derivatized citrullinated peptide standard. The enrichment process was performed as previously described with slight modi cations 55 . Brie y, 75 µL streptavidin agarose (Sigma) was washed with 1 mL 1× phosphate-buffered saline (PBS) for 5 times. Each time the tube containing beads was vortexed and centrifuged at 3,000 g for 2 min, and supernatant was removed. Peptide sample was resuspended in 1 mL PBS and loaded onto the streptavidin agarose followed by incubation at room temperature for 2 h with rotation. The agarose was subsequently washed with 1 mL PBS for 3 times, 1 mL 5% ACN in PBS for 3 times, and 1 mL water for 10 times. Peptides were nally released with 300 µL 80% ACN, 0.2% TFA and 0.1% FA for four times. The rst release was performed in room temperature for 5 min, while the other three release processes were conducted at 95°C for 5 min with shaking. The eluents were combined and dried in vacuo.
MALDI-MS analysis of citrullinated peptide standard and its derivatized form. Samples were resuspended in 50 µL 50% ACN and prepared by premixing 1 µL of them with 1 µL of 2,5-dihydroxybenzoic acid matrix (150 mg/mL in 50% methanol, 0.1% FA). One microliter of each matrix/sample mixture was spotted onto the MALDI target plate and detected on a MALDI-LTQ Orbitrap XL mass spectrometer (Thermo).
Ionization was performed using a laser energy of 15 µJ. Spectra were acquired with a mass range of m/z 1000 − 2000 at a resolution of 30k (at m/z 400).
Fragmentation of derivatized citrullinated peptide standard. The derivatized citrullinated peptide standard was resuspended in 1 mL 0.1% FA, 50% ACN and directly injected into an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo). Full MS scan was performed with a mass range of m/z 300-1500 using a resolution of 60k and RF lens of 30. AGC target was set to 2× 10 5 and the maximum injection time was 100 ms. The precursor ion was isolated in quadruple for HCD, ETD and EThcD fragmentation. Tandem MS spectra were collected, and fragment ions were manually annotated based on their accurate mass.
PAD treatment and digestion of histone H3. Ten microgram of recombinant human histone H3 (New England Biolabs) was incubated with recombinant human PAD2/PAD4 enzyme (Cayman Chemical) overnight at room temperature at a ratio of 2 µg enzyme per mg of histone. Histone with or without PAD treatment was then diluted with Tris buffer to a nal concentration of 100 mM Tris and 5 mM CaCl 2 (pH 7.5). LysC/trypsin mixture (Promega) was added in a 50:1 ratio (protein:enzyme, w/w) and incubated overnight at 37°C. Digestion was quenched by adding TFA to a pH < 3 and desalted using Omix Tips (Agilent) before drying in vacuo.
Protein extraction and digestion of mouse tissues. For method optimization, brain was collected from one mouse. For tissue-speci c citrullination and homocitrullination pro ling, ve brain regions and six body organs were collected: Bcortex (cerebral cortex), Scortex (hippocampus and thalamus), hypothalamus, cerebellum, medulla, spleen, pancreas, kidney, lung, heart, and liver. Each tissue was collected as triplicates from three mice. Tissues were dissolved in 150 µL of extraction buffer solution (4 % SDS, 50 mM Tris buffer) and sonicated using a probe sonicator (Thermo). Protein extracts were reduced with 10 mM dithiothreitol (DTT) for 30 min at room temperature and alkylated with 50 mM iodoacetamide for another 30 min in dark before quenched with DTT. Proteins were then precipitated with 80% (v/v) cold acetone (-20 ℃) overnight. Samples were centrifuged at 14,000 g for 15 min after which supernatant containing SDS (in the extraction buffer) was discarded. Pellets were rinsed with cold acetone again and air-dried at room temperature. Five moles of guanidine hydrochloride (GuHCl) were added to dissolve the pellets and 50 mM Tris buffer was used to dilute the samples to a GuHCl concentration < 0.5 M. On-pellet digestion was performed with either trypsin, LysC or LysC/trypsin mixture (Promega) in a 50:1 ratio (protein:enzyme, w/w) at 37 ℃ overnight. The digestion was quenched with 1% TFA and samples were desalted with Sep-Pak C18 cartridges (Waters). Concentrations of peptide mixture were measured by peptide assay (Thermo). Four hundred microgram of peptide was aliquoted for each sample and dried in vacuo.
Duplex isotopic dimethyl labeling. Forty microliters of H 2 O were added to dissolve peptide samples. After diluted to 1% (v/v) with H 2 O, 20 µL formaldehyde or formaldehyde-d2 solution was added to samples for light or heavy labeling, respectively. To each sample 20 µL of borane pyridine (30 mM) was then added to initiate the labeling reaction. Following incubation at 37 ℃ for 20 min, labeling was quenched by addition of 20 µL ammonium bicarbonate solution (200 mM). Labeled peptides were then combined in 1:1, 2:1 or 5:1 ratio (v/v, light/heavy). Samples were acidi ed with FA to pH < 3, desalted with Sep-Pak C18 cartridges and dried in vacuo for later biotin thiol tag derivatization.
Derivatization and enrichment of citrullinated peptides in histone and mouse tissues. Three hundred microgram of biotin thiol tag was added to each sample tube containing peptides from mouse tissues or histone and resuspended in 40 µL 12.5% TFA solution. Ten microliters of 2,3-butanedione solution prepared as mentioned before was added to initiate the reaction. The rest of derivatization, SCX and enrichment steps were the same as those for citrullinated peptide standard.
LC-MS/MS analysis. Samples were analyzed on an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo) coupled to a Dionex UltiMate 3000 UPLC system. Each sample was dissolved in 3% ACN, 0.1% FA in water before loaded onto a 75 µm inner diameter homemade microcapillary column which is samples, multiplicity was set to two with dimethLys0/dimethNter0 speci ed as light labels, and dimethLys4/dimethNter4 as heavy labels. Search results were ltered to 1% false discovery rate (FDR) at both peptide and protein levels. Peptides that were found as reverse or potential contaminant hits were ltered out and citrullination or homocitrullination site localization probability threshold was set to 0.75.
All other parameters were set as default. Bioinformatic analyses including Sankey diagram, arcplots and stacked bar graphs were performed using R packages. Heatmaps showing multi-tissue GO analyses were generated using Metascape 56 (version 3.5) while GO analysis for homocitrullination was accomplished using DAVID bioinformatics resources 57 with a FDR cutoff of 0.05. Sequence motif analyses were done using WebLogo 58 . For homology modeling, the 3D structure of mouse PKM2 (residues 14-531) was modeled according to the crystal structures of human PKM2, which delivered a sequence identity of 97.7% and represented the most similar crystal structures to mouse PKM2 retrievable from the Protein Data Bank. The homology model module of Discovery Studio 2016 was used for the multi-templates structure construction, and the ligands including PYR, SER, FBP were copied from the input templates. The output model with the lowest PDF total energy and DOPE score was adopted, and energy minimization was conducted on the adopted structure using CHARMm (version 40.1). PyMOL (version 2.4.0a0) was used to measure the Euclidean distances between the atoms of the selected arginine residues and atoms of annotated ligands (such as substrates and allosteric activator) in Å.

Declarations
Data availability The mass spectrometry proteomics data have been deposited to the ProteomeXchange 59 Consortium via the PRIDE 60 partner repository with the dataset identi er PXD023733. Public release of the data will be made on time for online publication of the paper. For anonymous access to peer-reviewers please use the following account information: Username: reviewer_pxd023733@ebi.ac.uk, Password: nGAPJjrm. Mus musculus and Homo sapiens databases used for database searching were downloaded from UniProt (https://www.uniprot.org/).

Code availability
All online-available software or R packages to perform data analysis or generate the gures are indicated throughout the manuscript.      Multiplexed quantitative citrullination analysis using chemical labeling strategies. a, Schematic showing the pipeline for simultaneous qualitative and quantitative analysis of citrullination using isotopic or isobaric labeling. Relative quanti cation can be achieved during survey scans or tandem MS scans, respectively. b, Spectra showing quantitation accuracy in duplex dimethyl labeling using citrullinated peptide standard. Peptide standard was dimethylated by either heavy isotope or light isotope labeling, resulting in a 4 Da mass difference. Heavy (red) and light (blue) labels were mixed in three known ratios (1:1, 2:1 and 5:1) and subjected to derivatization using biotin thiol tag. c, Boxplots showing quantitation accuracy and precision in duplex dimethyl labeling using mouse brain digest. Red dots indicate the detected ratios for each quanti ed citrullinated peptides. Top and bottom of boxes indicate 3rd and 1st quartile, respectively, and whiskers extend to 95th and 5th quartile. Horizontal lines within boxes denote median.

Figures
Supplementary Files