LC Gradient Time for Protein Identification at Single-Cell Levels
In general, an optimal protein identification can be achieved using 1 µg peptides for DIA-MS analysis in combination with 120-min LC gradient for large cell population (≥20,000 cells) and bulk cell samples27. However, such approach may not be ideal for small cell population. Therefore, we conducted a comparative analysis on different LC gradient settings for peptides equivalent to hundred-cell, ten-cell, and single-cell levels. We first compared two LC gradient settings for a single cell (0.05 ng peptides), 10 cells (0.5 ng) and 100 cells (5 ng) from MDA-MB-231 cell samples by computing the identification ratio of 15-min LC gradient to 120-min LC gradient based on the average number of identified proteins. We found the number of protein identification rate was more in 15-min LC gradient for single-cell and 10-cell levels. As shown in Fig. 1a, the protein identification ratios of 15-min to 120-min were more than 1 for single and ten MDA-MB-231 cells. We observed similar result for the single cisplatin treated PACC cell (0.2ng of peptides) (Fig. 1b), where the ratio of 15-min to 120-min was also more than 1, indicating that less proteins were identified from 120-min LC gradient and an improvement in overall protein identification using a short LC gradient time at single-cell level of peptide injection amount < 2ng. Therefore, we chose 15 minutes as our optimal LC gradient setting for single-cell level DIA analysis.
Evaluation of Global Proteomic Analysis at Single-Cell Level
Besides investigating the suitable LC gradient for acquiring DIA-MS data at single-cell level, the search space of single-cell DIA data (i.e., the size of internal library generated during directDIA search) was also evaluated among the established co-searching groups. The DIA data of MDA-MB-231 cell samples (0.05 ng peptides equivalent to the single cell level (0.05 ng), searching GS-1r_M (total of 271 peptide precursors from the raw file of a single MDA-MB-231 cell), and 126 protein groups were identified (Fig. 2a). As the size of the internal library increased to 5787 precursors (i.e., GS-16r), we observed the highest peptide and protein coverage for the single MDA-MB-231 cancer cell. Total of 1093 peptide precursors and 406 protein groups were identified using GS-16r (Fig. 2a), corresponding to gains of 303% and 222% at peptide and protein levels compared to the results obtained by using GS-1r_M only. For cisplatin-treated PACC at single-cell level (0.2 ng of peptides), 621 protein groups were identified using the directDIA approach to search against the internal library of GS-1r_P (2258 peptide precursors) (Fig. 2b). Moreover, we found that using co-searching group of GS-20r (16496 precursors) yielded the best identification, where 6153 peptide precursors and 1530 protein groups were identified (Fig. 2b). By using GS-20r, 172% and 146 % gains at peptide precursor and protein levels relative to using GS-1r_P, respectively. Of note, the number of identified proteins/peptides was not necessarily increased as the search space expanded. As shown in Fig. 2c, the best precursor identification is within the range of 2- to 4-fold difference between the total number of precursors in an internal library and total number of precursors detected in the sample of interest. Our results suggested that the internal library size was critical to protein identification at single-cell and sub-nano-gram levels via DIA-MS approach.
Furthermore, we evaluated the co-searching methods at ten-cell level. We identified 1816 protein groups (Fig. 2d) and 3260 protein groups (Fig. 2e) from 10 MDA-MB-231 cells and 10 PACCs, respectively. Similarly, the best peptide precursor and protein identifications were also fallen into the 2- to 4-fold changes between the internal library size and detected precursors (Fig. 2f). In addition, we observed the optimal protein identification via directDIA search for the single-cell and ten-cell injections when co-searched with injected peptide amount that were about10-fold difference within the similar samples (S-Table 5). Overall, it was essential to use a co-searching internal library generated from similar samples during the directDIA search to enhance protein identification at the single-cell and nano-gram levels.
Reproducibility on Single-Cell Level Proteomic Analysis Using DIA
After evaluation of LC gradient and size of internal library for DIA analysis at single-cell level, we investigated the inter-person reproducibility using our established workflow. Two sets of samples, each contained cisplatin- and docetaxel-treated PACC and parental MDA-MB-231 cancer cells, were prepared and analyzed a month apart by two researchers following the procedures stated in Materials and Methods section. Here, we use the cisplatin treated PACC sample and one MDA-MB-231 sample to demonstrate the reproducibility of our DIA method. We observed pairwise Spearman correlation >0.80 for the MDA-MB-231 cell sample between the two sets at single-cell, ten-cell, and hundred-cell levels (Fig 3a). We found similar results for cisplatin treated PACC samples, where Spearman correlation ≥0.83 were observed in three different cell populations (Fig 3b). Taken together, these results demonstrated that our optimized DIA workflow provided robust quantitative global proteome profiling for single-cell DIA analysis, and larger cell population further improved the reproducibility.
PTM Analyses for Nano-Gram Levels of Peptides without Enrichment
Protein modifications are important for the regulation of various protein activities and cellular signaling events, and alternation in PTMs are associated with many diseases, including cancer28. When conducting MS-based PTM analysis, PTM enrichment is an essential procedure; however, there is very limited report of nano-gram/single-cell level of enrichment strategies for MS analysis. Thus, we established an alternative approach for PTM analysis at such level by utilizing global proteomic DIA data and spectral libraries built from bulk samples. Unlike DDA-MS, DIA-MS allows that all the peptide precursors are co-fragmented within a selected m/z range to produce comprehensive MS2 spectra. The information of modified peptides should be retained in the global data even without PTM enrichment. Therefore, we were able to directly identify PTMs from the nano-gram level (i.e., 100 cells) of global proteomic DIA data using customized PTM spectral libraries for phosphorylation, acetylation, and ubiquitination.
We firstly explored the possibility of identifying phosphorylation, acetylation, and ubiquitination from global DIA data of PDX samples at nano-gram level (Fig. 4a). We identified 72 phosphorylated peptides, 35 acetylated peptides, and 99 ubiquitinated peptides from 100 PACCs, indicating the possibility of finding PTMs without using enrichment.
We further evaluated the association between PTM spectral library size and PTM identification by examining the alteration in phosphopeptide identification rate from the nano-gram level of global DIA data, since a large collection of phosphopeptide-enriched DDA and DIA raw files from CPTAC study22 allowed the construction of spectral libraries with various sizes ranging from ~42K to ~141K precursors. As shown in Fig. 4b, among the three phosphopeptide spectral libraries, the library containing ~84K precursors contributes to the highest identification number for 100 MDA-MB-231 cells (5 ng of peptides) and 100 PACCs (20 ng of peptides) of which 68 and 166 phopshopeptides with localized sites are identified, respectively. These results suggested that PTM analysis of nano-gram scale could be achieved by utilizing global DIA data along with a suitable PTM library built from bulk samples.
Application of Single-Cell Level DIA Approach to the Drug Resistant Cancer Cell Study
To investigate whether the difference in cell size affected identification and protein expression patterns, we conducted a comparative analysis between PACCs (large cells) and MDA-MB-231 cells (smaller cells) at 1 µg peptide injection and single-cell level of peptide injection. We observed 98.5% of overlap in protein identification between cisplatin treated PACC and MDA-MB-231 samples (Fig. 5a), suggesting that they shared similar proteome profile, regardless of cell size. At single-cell level, we examined the identified protein groups from the co-searching via all single-cell raw files (i.e., GS-10r, Fig. 2a). We observed 388 protein groups identified in the single MDA-MB-231 cell and 688 proteins were identified from the cisplatin-treated PACC at single cell level (Fig. 5b).
Although more proteins were identified using 1 µg peptide injections, we speculated that single-cell proteomic analysis could capture the real protein expression changes comparing to bulk proteomic analysis, which would assist in the study of cellular heterogeneity between PACC and parental MDA-MB-231 cells. We compared the protein fold changes as shown in Fig. 5c. At single-cell level, we found majority of the proteins showing higher expression in cisplatin resistant PACC relative to the parental MDA-MB-231 cell with Log2 fold changes ranged between 2 to 5, indicating protein copy number increase for these proteins after treating MDA-MB-231 cells with cisplatin and transitioning to a PACC state, except ubiquitin-binding autophagy associated protein (e.g., SQSTM1) and histone proteins (e.g., HIST1H4A and HIST1H1E) displayed similar expression profiles between PACCs and control cancer cells. In contrast, in bulk analysis of the MDA-MB-231 and cisplatin-treated PACCs, most of the proteins showed similar abundances at the same injection amount (1.0 µg) level; however, we found >2-fold decrease in PACC compared to MDA-MB-231 cancer cells for SQSTM1, HIST1H4A and HIST1H1E. If we only run the samples at 1µg peptide level, we could only observe the decrease in SQSTM1, HIST1H4A and HIST1H1E protein intensities in PACCs. However, at the single-cell level analyses, we noticed these proteins maintaining in the same expression levels in both PACCs and parental MDA-MB-231 cells. The same expression level of SQSTM1 suggested PACCs and MDA-MB-231 cells had the similar metabolic activity29, which may be not enough for a PACC to undergo depolyploidization to transition to typical MDA-MB-231 cells. As lack of multi-copy of histone proteins leading to cell cycle elongation30,31 and PACCs are also unable to divide to normal MDA-MB-231 cancer cells. Taken together, quantitative analysis of single-cell proteome via DIA approach can benefit our understanding of cellular heterogeneity and provide more accurate protein expression profiles which may be misinterpreted at bulk population.