To explore determinants of HLA-I peptide immunogenicity, we first assembled a dataset of 1,038 HLA-A02-restricted immunogenic 9mer epitopes from pathogens(29), as well as 1,845 non-immunogenic self 9mer peptides identified via mass spectrometry(30) (Fig. 1a). We used these data to develop a convolutional neural network that discriminates between immunogenic and non-immunogenic peptides (Fig. 1). We first applied a one-hot sequence representation (Fig. 1b) with a residue-similarity ordering, hypothesizing that both amino acid identity and arrangement within the peptide may affect immunogenicity. This input representation allows the neural network to learn aspects of amino acid similarity from a spatial encoding of proximity. By applying a 2 x 2 sliding convolutional filter, we integrated both amino acid sequence and similarity into the network (Fig. 1c). Applying a five-fold cross validation approach to our data, our current sequence convolutional neural network achieved a mean test AUC of 0.85 on the 2,883 peptide dataset. The AUCs achieved in a recent work focused on classifying immunogenic pathogen peptides range between 0.60 and 0.70 (28), suggesting enhancements in both data curation and architecture can improve prediction results (Supplementary Table 1 and Supplementary Text).
The general molecular topology of HLA-I bound to peptide antigens is widely conserved, but HLA-I/peptide complexes each interact in a specific manner with the surfaces of their conjugate TCRs. Therefore, we hypothesized that differences in peptide solvent exposure at the residue level could account for differences in immunogenicity, since the features of the presented peptide side chains represent the nexus of the immune recognition interface. In support of this hypothesis and consistent with previous analyses(16, 17), Riley et al.(28) reported that hydrophobic solvent-accessible surface area of presented peptides is correlated with heightened immunogenicity, particularly at positions 4, 5, and 8. To interrogate the structural dynamics of the peptide-HLA complex and its effects on immunogenicity, we conducted extensive atomistic molecular dynamics (MD) simulations of a representative subset of our peptide collection (Supplementary Figure 1). Notably, it has been widely established that protein dynamics simulations in explicit solvent properly capture the dynamic hydrophobic exposure and/or burial of side chains in peptide conformational ensembles, which are essential for proper function (31). MD simulations provide the added benefit of generating time-ordered conformations, facilitating the creation of temporal features that are useful in training classifiers.
In our study, peptides were selected to ensure sequence diversity within the MD subset, particularly in reference to positions 1 and 8. In total, 110 immunogenic pathogen and 123 non-immunogenic self-nonamers were loaded onto the HLA-A*02:01 peptide-binding groove (Fig. 2a) based on a common PDB structure (PDB: 5NMH) through structural alignment. Loaded peptides were then placed in an explicit solvent box, subjected to an annealing procedure to minimize clashes, equilibrated for 100 ns at 310 K and 1 atm, and then simulated at production for 500 ns at the same temperature and pressure (32-37). This procedure was also repeated for 17 immunogenic tumor neoantigens (29). The resulting simulation ensemble (comprised of 250 peptides, and 125 μs of aggregate MD simulation time) showed that hydrophobic exposure (defined by exposure of residues G, A, V, L, I, F, W, Y, and P) is enhanced at positions 4, 5, and 8 in immunogenic epitopes (Fig. 2b), with particular emphasis on positions 5 and 8. Another important observation derived from our MD simulations concerns non-hydrophobic exposure in non-immunogenic peptides: non-hydrophobic residues are considerably more exposed at peptide positions 4, 5, and 8, suggesting that immunogenicity is strongly anti-correlated with polar or charged residues at those positions. This finding is significant, despite the fact that enhanced hydrophobic exposure occurs in immunogenic epitopes at the same positions. Such anti-correlations with non-hydrophobic exposure are likely equally useful for immunogenicity classification.
Based on the above results, we hypothesized that an integrated representation of HLA-I binders, incorporating both sequence and dynamic structural features, would enable us to learn additional properties of HLA-I peptide immunogenicity and improve prediction of immunogenic epitopes. We designed a deep convolutional neural network that combines sequential, structural, and dynamical features for immunogenicity prediction. Our featurization approach involves computing residue-specific solvent-accessible surface areas (SASAs) as a function of time for each peptide sequence, with MD frames sampled every nanosecond (Fig. 2c). This procedure results in de facto coarse-grained images of the HLA-A*02:01 peptide presentation interface, a feature type that is well suited to convolutional architectures. The images were then split into 10 subtrajectories per peptide (representing 50 ns of MD each), with the dual purpose of providing more varied samples (2500, in this case) on which to train/test and to facilitate testing on shorter simulations, if desired. Importantly, samples from any single peptide were never divided between training and test data sets to avoid trivial classification outcomes.
Our integrated deep learning architecture is comprised of three modules: the sequence-based module adapted from above, and two additional sets of layers designed to encode molecular dynamics data. With respect to the MD data, each sample image corresponding to an MD trajectory was split into two channels to reflect our residue exposure observations (Fig. 3a): one for hydrophobic (G, A, V, L, I, F, W, Y, and P) residues, and a second for the complementary residues present in each peptide. These two images were then fed through respective series of three convolutional and pooling layers to generate internal tensor representations for both sets of residues. In these cases, pooling operations were only conducted in the time dimension of each image, as the full resolution in the residue dimension needs to be retained to capture each peptide position’s unique exposure statistics and relation to other residues.
The first two pooling layers in the MD convolution modules were designed to identify the most prominent features in each peptide column (a strategy often employed in convolutional neural networks called max pooling), while the third pooling layer was designed to compute an ensemble average over dynamic features representative of the underlying statistical physics of protein conformations. In standard MD data analysis, one typically employs some kind of clustering and/or ensemble averaging to understand the sampled conformational landscape. In the present case, the deep learning encoder plays the role of these traditional analytical tools in grouping and averaging conformations, using experimental immunogenicity labels as a guide.
The output of the hydrophobic and complementary convolutional modules was then concatenated and fed through a dense layer to yield a unified representation for the protein structure and dynamics. This MD-derived representation was then concatenated with the output from the sequence-based module and fed through a last dense layer, yielding an integrated sequential, structural, and dynamical representation that is used to predict a peptide’s probability of immunogenicity via a final softmax layer.
Conducting a five-fold cross validation on the peptide set for which MD data was collected, we found that the structural and dynamical features improved upon the performance of the sequence-only model. ROC curves for representative test sets from this procedure are presented in Fig. 3b. The MD plus sequence model yields an average test AUC of 0.89 (Fig. 3c), as opposed to the average AUC of 0.80 derived from the sequence-only model on an identical five-fold data split of this reduced peptide set. These results suggest that our integrated model can distinguish viral immunogenic peptides from non-immunogenic self peptides at a high level of discrimination.
Considering the performance gap between the sequence-only and MD plus sequence models, one can conclude the improvement derived from MD data does not result from trivial bias in how sequences were split between training and test sets. We conducted a pair of feature analyses aimed at explaining how the deep learning model learns from molecular dynamics data: the first based on residue identity, and the second based on residue position. In the first test, we artificially restricted the hydrophobic residue list to G, A, V, L, I, F (removing W, Y, and P, which have presumed importance because of their size and abundance, but can be considered non-hydrophobic in certain environments). This procedure reduces the AUC to 0.85 (Fig. 3c), suggesting that distinguishing between hydrophobic and complementary SASAs, rather than simply using generic residual exposure as a function of position, is important for predicting immunogenicity. Notably, the addition of MD-based features with the reduced hydrophobic residue list still significantly outperforms the sequence-only model on this same 250 peptide dataset (Fig. 3c).
In our second feature analysis, we aimed to ascertain the relative impact of residues at each peptide position (P1-P9) on immunogenicity prediction. To do so, we set all input apart from MD-based features at one specific peptide position to zero in each of nine five-fold training runs. While underlying correlations in residue dynamics present in MD trajectories cannot be removed, our ablation procedure ensures that features are only explicitly considered from one peptide position at a time.
This feature ablation procedure illustrates differences in immunogenicity prediction depending on the position considered (Fig. 3d). Perhaps unsurprisingly, the model is able to learn the least from dynamics of P2 and P9, which correspond to peptide anchor residue positions for HLA-A*02:01. This result further reinforces the point that peptide immunogenicity is a concept distinct from peptide-HLA binding, since properties of residues at P2 and P9 are known to be essential to the binding process. Outside of the anchor residues, model performance is relatively uniform across the positions considered, suggesting the model is able to distinguish a variety of peptide presentation modes in determining immunogenicity. The worst model performance at a non-anchor position occurs at P4; past work has suggested residue exposure at P4 is particularly important to peptide immunogenicity26, but we cannot draw the same conclusion based on the present data. Interestingly, our model is able to perform relatively well with residue solvent exposure information from P1, a peptide position not typically associated with TCR interactions. Regardless, the result from training on the full set of MD features simultaneously (mean test AUC ~0.89 with sequence information; ~0.84 without sequence information) is still significantly higher than that of any model trained on a single residue position, implying that explicit correlations between positions do contribute positively to classification.
The above data thus indicate that a deep learning model that combines molecular dynamics data with sequence features can be used to improve peptide immunogenicity prediction with respect to distinguishing pathogen peptides from self. Importantly, the addition of high-level MD features serves to add to the performance of a sequence-only baseline computed on identical datasets, suggesting that MD-derived improvements do not result from trivial similarities across train-test divides.
We next interrogated how well our convolutional model can be extended to classifying cancer neoantigens, a task that is useful for many therapeutic applications. For example, neoantigen classification would be useful if one would like to discriminate between immunogenic and non-immunogenic neopeptides for antigen selection in vaccine design, an approach that goes beyond just making generating HLA-I binding predictions. Though our initial model was trained on features from 17 HLA-A*02:01 immunogenic neoantigens(15), it was not exposed to any non-immunogenic neopeptides, which themselves may be distinct from standard self-peptides. Accordingly, we conducted a full MD simulation and featurization on 15 non-immunogenic neopeptides (and one additional immunogenic neoepitope, yielding 18 immunogenic peptides and 33 peptides in total) to create a neoantigen test set (hereafter called “Small Neo”), and evaluated this test set using on models trained on a range of training sets and features. For reference, the non-self/self (NS/S) MD training set listed in Supplementary Table 3 includes the 95 immunogenic pathogen and 123 self-non-immunogenic peptides fully featurized with MD, and is used as the single training set for the remainder MD-based models discussed in this work. Sequence-only models were constructed using all relevant training sets, and the 291 neoantigen dataset presented in Riley et al.(28) (hereafter known as “Full Neo”) was used as an additional test set for the purpose of comparison. Single MD plus sequence models were trained on the NS/S MD training set; for completeness, MD-only models were trained only on MD-derived features.
Not too surprisingly, the sequence-only models trained on either large peptide training (n=2883/n=3955) set could not classify either of the reduced (“Small Neo”; n=33) and full (“Full Neo”; n=291) neoantigen test data sets (Supplementary Table 2). This lack of neoantigen classification is consistent with previous results(28), where an AUC of 0.50 was reported for a sequence-only model and 0.60 for a structural model on the full neoantigen test set. Our smaller, diversity-conditioned peptide training set yields better classification on both neoantigen test sets (AUC = 0.67 – Small Neo; AUC = 0.57 – Full Neo) (Supplementary Table 3). Our sequence-only result on the full neoantigen set approximates that of previously described more expensive structural model(28) (See Supplementary Information).
Adding MD-derived features to our reduced NS/S training set and small neoantigen test set improved the AUC from 0.67 to 0.71 (Supplementary Table 3). This result falls far short of the 0.89 reported from cross-validation, suggesting that neoantigens are much more challenging to differentiate in terms of immunogenicity than peptides from pathogens. This observation could have origins in the fact that neoantigens are derived from the human genome, and thus might be more similar to one another than or have structural and dynamic features distinct from pathogen and human self peptides. However, an AUC of 0.71 still represents a reasonable level of classification, and the MD plus sequence, MD-only, and sequence-only results suggest that those feature sets can cooperate in determining immunogenicity. Fig. 4a shows box-and-whisker plots of predicted probabilities for the immunogenic and non-immunogenic peptides in the small neoantigen test set derived from the best-performing MD plus sequence model. The model does differentiate the two classes of peptides with reasonable separation: the average predicted probability for immunogenic samples is 0.69, while the average value for non-immunogenic samples is 0.42. By contrast, the corresponding values are approximately 0.90 and 0.05 in models trained and tested on pathogen/self peptides These differences may suggest that non-immunogenic peptides share features correlated with immunogenicity and that immunogenic neoepitopes diverge less from nonimmunogenic neoepitopes than pathogen peptides diverge from self peptides.
To further assess whether our model could identify immunogenic cancer neoepitopes among HLA-A*02:01 binders from individual tumor genomes, we generated neoantigen test sets from peptides derived from two individual patients with melanoma (n=8 peptides for “Patient 1” (38), and n=15 peptides for “Patient 2”(23)). The patient-specific test results are shown in Fig 4b and Supplementary Table 3. Intriguingly, the results for each patient diverge with respect to dominant feature type. However, the results for both patients were favorable overall (Fig. 3b); with the MD plus sequence model, both of Patient 1’s top ranked peptides and three of the top four-ranked Patient 2 peptides were identified as immunogenic. It is important to note that these patients’ data were obtained under specific conditions and limited by screening assay sensitivity and the presence of potential immune evasion. It is nonetheless encouraging that our model’s predictions feature strong enrichment of immunogenic peptides, providing an example of how neopeptides might be prioritized in personalized vaccine design.
Fig. 3c provides two anecdotal examples in which MD-derived features appear to improve classification outcomes. In the case of AVGSYVYSV, the MD simulation reveals conformational dynamics that expose Y5, a residue for which such exposure is likely correlated with immunogenicity (and indeed seems to advance the immunogenicity probability prediction from 6% in the sequence-only model to 99% in the MD plus sequence model). By contrast, the non-immunogenic LMASISSFL features large, hydrophobic residues at positions 5 and 8, which triggers a strong immunogenicity prediction in the sequence-only model (97%); however, the MD simulation has the effect of burying F8, reducing the predicted immunogenicity probability to 31% in the MD-only model. The presence of I5/F8 seems to be so strongly correlated with immunogenicity that even the MD plus sequence model produces a high immunogenicity probability (94%). While neural networks learn from a complicated array of internal representations that are often difficult to interpret, these two peptides provide examples of how MD-derived features can improve immunogenicity prediction in a deep learning framework.
To expand our exploration of non-immunogenic neoantigens, we featurized the full (n=274) non-immunogenic neoantigen set by collecting a single 50 ns sample for each peptide. All peptides were equilibrated for 25 ns prior to production. Combining these non-immunogenic neopeptide data with those previously generated for 17 immunogenic neoantigens and feeding all samples into the MD plus sequence model, we see a modest improvement in AUC relative to the sequence only model (0.61 vs. 0.57). These results are consistent with those reported by Riley et al.(28) for structural featurization (0.60). It is possible that a full (10 sample) featurization could improve our model’s performance on these data, though the improvement derived from MD is consistent with that observed for the neoantigens that were fully featurized (0.71 vs. 0.67).
These results on the full non-immunogenic neoantigen set raises the question of why neopeptides are apparently more difficult to discriminate than immunogenic pathogen and self non-immunogenic peptides. A part of the answer may lie in the characteristics of the non-immunogenic neopeptides available for study. Our MD featurization of the 274 neoantigens allows for a direct comparison of A02-presented peptide interfaces with those from other peptide classes. Non-immunogenic neopeptide SASA differences relative to the same self-peptide set featured in Fig. 2b are shown in Fig 4d. Surprisingly, the presented interfaces of these non-immunogenic neopeptides are quite similar to those of immunogenic pathogen peptides, with respect to both our chosen hydrophobic and complementary residue exposure metrics.