In this study, we explored the practical limits for loading differential amounts of peptides across channels in a TMT11 multiplex experiment. Biologically identical material derived from a common peptide pool generated from the white blood cells of a single patient was analyzed in aliquots ranging from the “gold standard” for clinical proteomics experiments—400 µg of peptide per TMT channel—down to 20 µg per channel (representing 20-fold less peptide). Samples were randomly divided across 2 TMT11 plexes and processed for LC-MS/MS-based global proteomics and phosphoproteomics measurements using a standard clinical proteomics protocol developed under the CPTAC. Analysis of the data generated from this work aimed to address two aspects of concern for quantitative proteomics experiments: missing data and data reproducibility.
It is readily clear from our results that peptide quantity and missing data are inversely correlated at the channel level—that is, as lower amounts of peptide are labeled in a channel, the number of features for which no quantitative information is extracted increases. While it is well-established that missing data is a major challenge in proteomics analyses (29, 30), this typically only arises when comparing data acquired across multiple TMT batches, and there are generally very low rates of missingness within a single TMT plex when all channels contain equivalent amounts of labeled peptide (31). Our data illustrates an exacerbation of the cross-plex missing data problem, with more than 50% of phosphopeptide identifications failing to be quantified in channels loaded with only 20 µg. Furthermore, within either of the two multiplexes, only 27% and 45% of phosphopeptides were quantified in all 10 channels, and only 79% and 87% of phosphopeptides were quantified in more than 6 channels (Supplemental Table 1). These rates of missing data across channels within a single plex are significantly higher than in experiments where all channels are loaded equally—on average, greater than 95% of phosphopeptides are observed in all channels, and over 99% are observed in more than 6 channels (Supplemental Table 1). While the issue of missing data is more apparent in phosphoproteomics measurements likely due to the low abundance of enriched phosphopeptides, the problem still exists in global proteomics. At the peptide level, only 76% of observations were quantified in all channels of either multiplex, while 96% and 97% of observations were quantified in more than 6 channels (Supplemental Table 2). Global proteomics measurements benefit from the aggregation of data to the protein level; when evaluating quantification at the protein level, 95% and 96% of observations have values in all channels of either plex (Supplemental Table 3). Again, these values are higher than standard, equally loaded TMT experiments, where 99% of peptides and proteins are typically observed in all channels (Supplemental Tables 2 and 3). In all cases, the higher levels of missing data occur in channels with lower peptide loading, which we attribute to the reduced signal-to-noise ratio for these channels.
In addition to higher levels of missing data, TMT channels loaded with less peptide also displayed increased variation among the replicates. Coefficient of variation values steadily increased as peptide loadings were reduced, with noticeable differences present in the 40 µg and 20 µg samples, particularly in phosphoproteomics data. Additionally, PCA analysis illustrated that samples in the 40 µg and 20 µg groups began to separate away from the 400 µg, 200 µg, and 100 µg samples, which all clustered tightly together. We performed ANOVA statistical tests to compare each different loading group to the 400-µg group and observed statistical differences in the 20-µg group that persisted even after multiple hypothesis testing correction. Together, this data demonstrates that while 10-fold sample loading differences lead to increased variation and separation from other groups, the effects are negligible from a statistical standpoint; however, by 20-fold sample loading differences the impact on data quantitation can no longer be corrected.
The impacts of differential peptide loading on missing data and quantitative reproducibility have a compounding effect when one is interested in comparing samples. First, the amount of missing data present in samples with lower loadings reduces the number of observations for which statistical analysis can be performed: in this case, ANOVA testing required data to be present in all samples, which drastically decreased the number of phosphopeptides observed (27,351) to those which could be analyzed statistically (3,366). Relaxing the criteria for missing data- i.e. only requiring 2 or 3 observations of the 4 samples in a group—will allow for more features to be compared, but a decreased n will lower the statistical power. Second, among the features that have enough observations for statistical testing, quantitation is negatively impacted in samples with lower peptide loadings—likely due to signals closer to the noise level of the mass spectrometer, leading to increased variation when compared with higher loading groups. Importantly, the samples in this experiment were all derived from a common biological source; in true clinical studies, negative impacts on reproducibility will increase quantitative variability and reduce the statistical power, hindering the ability to confidently detect differences between patients, tissue types, or normal vs. diseased samples (28, 32).
To Illustrate this point, we calculated and plotted the standard deviations for each phosphopeptide when using only the samples loaded with 400 µg of peptide, or when combining any of the other loading groups with the 400 µg samples. As lower loading groups are combined with the 400 µg samples, standard deviation measurements increase (Supplemental Fig. 2A). Using the mean standard deviation for each sample set, we estimated the sample size per group necessary to measure fold changes ranging from 1.2 to 2-fold with statistical significance (Supplemental Fig. 2B). While larger fold changes are detectable in any sample set with a reasonable number of patients per group (< 2 patients per group for equal 400 µg loadings; 7.8 patients per group with 400 µg and 20 µg samples), smaller fold changes (1.2 and 1.4 fold) require a dramatic increase in the number of patients per group to detect statistical significance when combining the more variable 20 µg loading group with the 400 µg loading group. Compared to equal 400 µg loading, where only 4.8 or 2.4 patients per group are required to detect fold changes of 1.2 and 1.4, respectively, combining 400 µg and 20 µg loadings increase the necessary sample sizes to 100 and 29 patients per group. From a clinical standpoint, obtaining and analyzing samples from this number of subjects can be a major challenge.
Reproducible and in-depth proteomic analysis of samples smaller than 500,000 cells requires significant improvements in sensitivity over standard approaches. The necessary technological improvements are actively being pursued in our lab (21, 33–36) and others (20, 37–39) with great success in recent years. However, much more work is needed to make these technologies available to less specialized laboratories.