Sample preparation
C57BL/6 male mice were purchased from CLEA Japan, Inc. (Tokyo, Japan). All procedures involving animals complied with the guidelines of the National Institutes of Health and were approved by the Animal Experimentation and Ethics Committee of Kitasato University School of Medicine. The whole liver of each mouse was homogenized on ice using a BioMasher II (Nippi, Tokyo, Japan) for 3 min with 1 mL of phase-transfer surfactant (PTS; 12 mM sodium deoxycholate, 12 mM sodium lauryl sulfate, and 200 mM triethylammonium bicarbonate [TEAB]) 23. Aliquots of the homogenate were sonicated in a Bioruptor sonicator (SONIC Bio Co., Kanagawa, Japan) for 30 min (30 s on/30 s off, high setting) while on ice water. Insoluble materials were removed by centrifugation at 19,000g for 15 min at 4 °C. The protein concentration was measured using a NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and adjusted to 1 µg/µL with PTS. Protein extraction samples were flash-frozen using liquid nitrogen and then stored at −80 °C until use.
For evaluation of the machine learning algorithms, proteins extracted from 20 µg of mouse liver were resuspended in 20 µL of PTS and incubated with the addition of 2 µL of 200 mM Bond-Breaker TCEP solution (TCEP, Thermo Fisher Scientific) for 30 min at 50 °C to cleave the disulfide bonds, and then the solution was further incubated on ice for 10 min. The reduced proteins were then alkylated with 2 µL of 375 mM iodoacetamide and 200 mM TEAB in the dark at room temperature for 30 min. The alkylation reaction was quenched by addition of 2 µL of 400 mM L-cysteine and incubation in the dark for 10 min at room temperature. The sample was digested with 200 ng each of trypsin and lysylendopeptidase for 18 h at 37 °C. The reaction mixture was then mixed with a 1.5× volume of 1.7% trifluoroacetic acid (TFA) and subsequently centrifuged at 19,000g for 15 min at 4 °C. The supernatant was desalted using StageTips with a C18 Empore disk membrane, as described previously 23. The fraction was eluted using 50% acetonitrile (ACN) and 0.1% TFA and then freeze-dried. The freeze-dried sample was resuspended with 20 µL of 3% ACN and 0.1% formic acid (FA) using a combination of vortexing and ultrasonic agitation in a Bioruptor sonicator (30 s on/30 s off, high setting) for 10 min each while on ice water. The sample was analyzed using a quadrupole Orbitrap benchtop mass spectrometer (Q-Exactive, Thermo Fisher Scientific) equipped with an EASY-nLC 1000 system (Thermo Fisher Scientific). Tryptic peptides were injected directly onto an analytical column (C18, particle diameter 3 µm, 0.075 mm ´ 125 mm; Nikkyo Technos, Japan). Tryptic peptides were separated with a gradient of solvents A (0.1% FA) and B (0.1% FA and 90% ACN) (0-1 min, 5-10% B; 1-20 min, 10-25% B; 20-26 min, 25-50% B; 26-27 min 50-80% B) at a flow rate of 300 nL/min using the EASY-nLC 1000. Peptides were introduced from the chromatography column to the Q-Exactive. Some parameters of the MS spectrum were as described previously 9. MS1 spectra were collected over the scan range 350-900 m/z at 70,000 resolution to hit an automatic gain control (AGC) target of 1 ´ 106. The AGC target value for fragment spectra was set at 1 ´ 105. The 20 most-intense ions with charge states of 2+ to 4+ that exceeded an intensity of 2.0 ´ 103 were fragmented.
For quantitative comparisons, proteins extracted from 20 µg of mouse liver dissolved in 20 µL of PTS were dimethylated with 8 µL of 0.6 M NaBH3CN and 16 µL of 4% 12CH2O (light-labeled) or 4% 13CH2O (heavy-labeled) for 10 min at room temperature. The dimethylation reaction was quenched by addition of 8 µL of 1% NH3 and incubation for 1 min, and then the light- and heavy-labeled samples were mixed. A total of 58 µL of the mixture sample were precipitated by the addition of 700 µL of ACN followed by the addition of 25 µL of 5% TFA. After centrifugation at 19000g for 15 min at 4 °C, the supernatant was discarded to collect the precipitate. The precipitate was dissolved with 20 µL of PTS, and the subsequent procedures of alkylation, digestion, and LC-MS analysis were performed according to the above procedures described for evaluation of the machine learning algorithms.
Peptides were introduced to the Q-Exactive from an analytical column (C18, particle diameter 3 µm, 0.075 mm ´ 125 mm; Nikkyo Technos). Tryptic peptides were separated with a gradient of solvents A and B (0-29 min, 5-30% B; 29-37 min, 30-55% B; 37-38 min, 55-80% B) at a flow rate of 300 nL/min using the EASY-nLC 1000. MS1 spectra were collected over the scan range 350-1400 m/z at 140,000 resolution to hit an AGC target of 3 ´ 106. The two most-intense ions with charge states of 2+ to 4+ that exceeded an intensity of 2.0 ´ 105 were fragmented. Other parameters were set as described for evaluation of the machine learning algorithms.
All raw data files obtained in the LC-MS/MS analyses were deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the jPOST partner repository (http://jpostdb.org) 24 with the dataset identifiers PXD027824 for ProteomeXchange and JPST001287 for jPOST.
Protein identification
LC-MS/MS data were searched against the mouse UniProt sequence database (release 2018; 25,131 entries, reviewed). Database searches were performed using the SEQUEST algorithm incorporated into Proteome Discoverer 1.4.0.288 software (Thermo Scientific) with the following parameters: enzyme, trypsin; maximum missed cleavage sites, 3 for evaluation of machine learning or 2 for quantitative comparisons; precursor mass tolerance, 6 ppm; fragment mass tolerance, 0.02 Da; fixed modification, cysteine carbamidomethylation; variable modification, methionine oxidation. For quantitative comparisons, light-labeled dimethylation (+28 Da) at lysine and heavy-isotope labeled dimethylation (+34 Da) at lysine were adapted as the search parameters. Peptide identification was filtered to a false discovery rate (FDR) of <1%.
XICs for precursor ions were obtained using Skyline 20.1.0 (http://proteome.gs.washington.edu/software/skyline) 6,7 based on the identified peptide library. The spectrum library was imported from the msf file generated by Proteome Discoverer with a cutoff score of FDR = 0.99. Peptide settings were as follows: enzyme, trypsin KR/P; maximum missed cleavages, 2; minimal length of peptide, 7; maximal length, 30; modifications, carbamidomethyl (Cys), oxidation (Met); maximum variable mods, 5. Transition settings were as follows: precursor charges, 2+-4+; type, p (precursor); ion mass tolerance, 0.02 m/z; isotope peaks included, count 3; mass analyzer, Orbitrap, resolution, 70,000 at 200 m/z; use only scans within 5 min of predicted retention time; isotope labeling enrichment, default.
Extraction of informative features from chromatographic peaks
Nine types of informative features of the chromatographic peaks were extracted using Skyline: idotP, average mass error, signal-to-noise ratio, standard deviation of the intensity of FWHM of isotope peaks, average retention time, intensity at chromatographic peak boundary, shape similarity, and co-elution score (Supplementary Table S1 and Supplementary Figure S7). Jagging score was defined as the number of data points lower than the FWHM within an integral interval of the peak. Shape similarity score was defined as the Pearson product-moment correlation coefficient generated based on the similarity in shapes of chromatographic peaks of isotopes. The co-elution score was defined as the average shift in the cross-correlation function for each pair of isotopic peak traces within the window of the selected peak, as described in a previous report 25. For all features, missing values were replaced with a zero.
Peak extraction and assignment
All values in each feature parameter were scaled using min-max normalization. Subsequently, the dimensionality from the original feature space was reduced using PCA. We selected PCA components as inputs and then applied them to SVM 19, ANN 20, KNN 18, and GNB 21 algorithms. The values of all feature parameters not subjected to min-max normalization and PCA were placed into other machine learning algorithms, RF 16 and XGB 17. The k-fold cross-validation (k = 5) approach was used to avoid the overfitting problem, and the hyper-parameters were optimized as described previously 26. Cross-validation and optimization of hyper-parameters were applied for 5 machine learning algorithms, except GNB. True peaks and noise peaks in the training example were annotated manually.
Quantification
Peptide pairs for which both the light- and heavy-labeled peptides were identified were chosen for comparative quantification. The sum of the XIC area of three ion precursors (monoisotopic mass [M] and isotopic masses [M+1 and M+2]) generated from the respective peptides was determined as the corresponding peak area.
Coding environment
Python 3.7.7 was used to perform the machine learning analyses using the following imported libraries; numpy 1.19.1, pandas 1.1.0, scikit-learn 0.23.1, xgboost 0.9, matplotlib 3.2.2, and seaborn 0.10.1. Figures were prepared using matplotlib and seaborn. FeatureExtract.py and MachineLearning.py were used for extraction of chromatographic features and execution of the machine learning algorithms, respectively. Both python scripts are shown in the Supplementary Materials.