In this analysis we showed that the computational method NetMHCpan4.0 predicted 95% of previously experimentally mapped HIV-1 epitopes in 6 HIV infected individuals expressing a total of 20 different HLA class I alleles. In our IFN-g ELISPOT assays we evaluated 757 17mer peptides overlapping by 11 amino acids and covering the whole subtype A1 and D consensus proteomes. The NetMHCpan4.0 algorithm scans protein sequences producing binding score outputs for 9-14mer epitopes, therefore, for the purpose of this evaluation, predicted 9-mer epitopes were matched to the experimental 17mer sequences that included them wholly. Out of the 5 experimentally determined epitopes missed by the algorithm, 4 were actually computationally predicted as binders but were not included for lack of concordance with the participant’s HLA alleles. About one third (37) of 125 total positive predictions were not experimentally supported in our tests. These do not necessarily represent false positives, as ELISPOT detection depends on the frequency of specific T cells in the participant’s repertoire, and we observed changes in dominant T cell specificities within a given participant between early and later time points after HIV-1 infection. A formal ROC evaluation of the score generated by NetMHCpan4.0 as a classifier for peptides recognised/not recognised by PBMC in IFN-g ELISPOT assays, produced an AUC of 0.928. Thus experimental confirmatory tests cannot be dropped altogether, however the NetMHCpan4.0 algorithm could provide a considerable saving of time and resources in verifying just the predicted epitopes.
We observed a strong correlation between protein size and number of epitopes predicted, with the largest number of epitopes in the Env protein followed by Pol and Gag. Subtype D sequences had more predictions than subtype A1, although the difference was not statistically significant.
As participants had been enrolled in the acute/early phase of HIV-1 infection and we had observed intra-participant changes in epitope recognition between early and late time points after infection, we compared the binding scores of confirmed epitopes at these time points and found a statistically significant change towards recognition of higher binding peptides as the infection entered the chronic phase. This might represent better support of the T-cell response directed at more stable HLA/peptide complexes as the infection progresses into chronicity.
The NetMHCpan4.0 algorithm, which is based on binding affinity and integrates data on eluted naturally processed ligands, reflected optimal HLA class I binding for 9-mers, producing a decreasing number of predictions when the peptide size was increased from 9 to 11 amino acids, with minor differences for further increases between sizes of 11, 12, 13 and 14 amino acids. With a single exception, predicted binders between 11-14 amino acids included at least one 9mer predicted to bind on its own, suggesting that a destabilizing effect of the extra amino acids beyond the canonical HLA class I binding pockets at positions 2 and 9 could account for fewer predictions.
Important limitations of this study are the mismatch between sizes and scanning frames of previously experimentally evaluated (17mers overlapping by 11 aa) and algorithmically predicted peptides (9-14 mers overlapping by 8-14aa), and lack of predictions of HLA class II restricted epitopes, which might have contributed to a fraction of IFN-g ELISPOT responses.
These could be addressed by further developments of the prediction algorithm.