Identification and characteristics of HLA-I immunopeptidome in HepG2 cell line
The process of identifying neoantigens is illustrated (Fig. 1). Immunoprecipitation successfully enriched HLA-I peptide complexes from HepG2 cell lysate. To evaluate the enrichment efficiency, we compared the HLA A/B signals in TCL, FT and elution by western blotting, quantifying them using grayscale (Fig. 2a, supplementary Fig. S1a). Western blot analysis revealed that the HLA A/B signal in the FT was significantly lower compared to the TCL, while a strong signal was observed in the elution. Greyscale quantification demonstrated successfully enrichment and elution of nearly 50% of HLA A/B and eluted from the TCL (Fig. 2b). Subsequently, a total of 8,549 peptides were identified through LC-MS/MS analysis. Of these, approximately 74.6% of them (n = 6,376) being 8–14 a.a in length, which corresponds to the length characteristics of HLA-I immunopeptidome (Supplemental Table S1). In three biological replicates, we identified 4,537, 3,900 and 3,481 8–14 a.a peptide sequences, respectively (Fig. 2c). Venn diagrams demonstrated commonly identified peptides comprised the highest portion, demonstrating consistency in the enrichment and identification of HLA-I immunopeptidome (Fig. 2d). Furthermore, Hyperscores, used for evaluating the quality of spectra by comparing observed spectra to theoretical ones generated by MSFragger,[29] were compared between HLA-I immunopeptidome from three replicates, indicating consistency in the mass spectrum quality (Fig. 2e). The theoretical reaction time (RT) exhibits a strong correlation with the observed RT (supplementary Fig. S1b). Peptides with lengths ranging from 8–14 a.a exhibited a distribution pattern, with the majority consisting of 9 a.a peptides (Fig. 2f, g).
Discover HepG2 specific mutant peptide using HepG2 WES-based database
Based on the satisfied data quality of the HepG2 HLA-I immunopeptidome, we utilized HepG2 WES-based database to identify potential neoantigen from HLA-I immunopeptidome. The Gibbs clustering approach was used to analyze the anchor residues of the eluted peptides. The results revealed primary binding motifs that primarily clustered in two groups (Fig. 3a, b). Subsequent analysis was performed using NetMHCpan 4.1 with default settings. Peptides were categorized as having strong binding when the percentage rank was less than 0.5%, and weak binding was assigned to those within the percentage rank range of 0.5 to 2.0%. The results indicated that a significant proportion of 8–14 a.a peptides were predicted to bind to HLA-I molecules. A preference for binding was observed across different HLA alleles, with HLA-A0201 (n = 2,014) and HLA-A2402 (n = 1,918) exhibiting particularly strong preference (Fig. 3c). The distribution of amino acid species at the P2 (Second amino acid) and PΩ (Last amino acid) positions in the Gibbs clustered peptides corresponds with the distribution observed in the NetMHCpan reference database (supplementary Fig. S2d, e).
These findings are consistent with the results obtained from Gibbs clustering, and indicating that the binding motif clusters within HLA-A0201 and HLA-A2402. Additionally, there was a length distribution among strong binders, and it was found that 9 a.a peptides were the most frequently observed (Fig. 3d), which is consistent with the length distribution of HLA-I immunopeptides reported in previous studies[12, 14, 36]. Furthermore, 9 a.a peptides were predicted to have a lower elution percentage rank, indicating a stronger binding ability to the HLA-I molecules (Fig. 3e). However, there was no significant difference in peptide intensity observed among peptides of different lengths (supplementary Fig. S2a). Our study identified only 1 mutant peptide using the HepG2 WES-based database (Table 1, Fig. 3f), indicating the limited effectiveness of personalized-specific databases in identifying mutant immunopeptidomes. Similar findings were reported in a previous study[16]. Notably, the HLA-I immunopeptidomes of HepG2 contained both the wild type peptide "SLFDVSHML" and the mutant peptide “SLFDASHML”. Although the mutant peptide and wild-type peptide had low intensities, their predicted HLA-I affinity was higher compared to other peptides (Fig. 3g, h).
Table 1
List of mutant peptides identified by HepG2 WES-based database
No. | Peptide | Hyperscore | Length | Nucleic variant | Amino variant | Protein | Binding level | HLA allele | Elution rank % |
1 | SLFDASHML | 11.98 | 9 | c.242T > C | p.V81A | AMT | Strong | HLA-A*02:01 | 0.003 |
2 | SLFDVSHML | 15.81 | 9 | WT | WT | AMT | Strong | HLA-A*02:01 | 0.005 |
Generation of COSMIC-based Database
The construction workflow for the mutation database was summarized into four parts (Fig. 4a). The genes sequences of mutations were extracted from COMSIC genomic database, which consisted of 1,233,831 mutations, and HepG2 WES database, which contained 388 mutations. Non-synonymous mutations lead to changes in the amino acid sequence. Therefore, further filtering was conducted to remove synonymous and redundant mutations. As a result, 279 mutations were identified in the HepG2 WES-based database, and 81,137 mutations were identified in the COSMIC-based database. Among the 957 samples in the COSMIC-based database, TP53 was the most frequently mutated gene, followed by TTN and CTNNB1 (Fig. 4b). Notably, most of the samples exhibited a low TMB, although a few individuals displayed exceptionally high TMB. These results are consistent with previous findings that HCC has a relatively lower TMB compared to other types of tumors[19, 20]. The distribution of mutant classes in the HepG2 WES-based and COSMIC-based database was similar, with missense mutations being the most common, followed by nonsense mutations (Fig. 4c and Table 2). The proportions of other mutation types, including nonstop mutations, insertions, and deletions, were similar in the COSMIC-based database. Only 1 mutation was found in both databases, while 222 mutated genes were commonly identified (Fig. 4d, e). To conduct subsequent MS data searches, a COSMIC-based database was constructed by incorporating these filtered somatic mutations from both COSMIC and HepG2 WES into the UniProt Human database.
Table 2
Statistic of mutant classification in HepG2-based and COSMIC-based database
Mutant classification | HepG2-based database | COSMIC-based database |
Translation start site | 0 | 21 |
Nonstop mutation | 0 | 415 |
Nonsense mutation | 12 | 6640 |
Missense mutation | 240 | 75883 |
In frame insertion | 3 | 100 |
In frame deletion | 0 | 497 |
Frameshift insertion | 3 | 1052 |
Frameshift deletion | 4 | 1278 |
Evaluation of HepG2 WES-based and COSMIC-based database
To evaluate the impact of HepG2 WES-based and COSMIC-based database on database search result, a comparison was made between the outcomes of the two mutation databases. Venn diagrams showed that a majority of immunopeptides (n = 6,245) were commonly identified by both databases, resulting in a total of 2,565 shared proteins (Fig. 5a). The quality of unique identified peptides by each mutation database was initially evaluated (Fig. 5b). Comparative analysis revealed that, in comparison to the commonly identified peptides, the majority of unique identified peptides from either mutation database exhibited lower hyperscores, indicating lower spectrum quality. Nonetheless, a subset of spectra exhibited hyperscores higher than the average hyperscore of the commonly identified counterparts, suggesting that the use of mutation databases enables the discovery of unique peptides with high quality. Further investigation was conducted on the unique peptides identified by both databases. Incomplete product ion coverage was observed as a prevalent scenario, resulting in spectra matching to different peptides. manual inspection of the MS2 spectra revealed that 41 peptides, comprising 11.71% of all unique peptides, had equal-weight amino acids or combinations (supplementary Fig. S3a). Furthermore, a quantitative evaluation demonstrated a strong correlation between the peptide intensities obtained from the two search results (Fig. 5c).
We observed that unique identified peptides from both mutations databases had statistically lower intensities compared to their commonly identified counterparts. However, there was no significant difference between in the intensities of unique identified peptides (Fig. 5d). Furthermore, a comparison of the length distribution patterns for commonly and uniquely identified peptides revealed that both groups exhibited a similar distribution, with the majority of peptides having a length of 9 a.a (Fig. 5e). These results imply that the unique identified peptides are highly likely to be immunopeptides. To evaluate the their immunoaffinity, we utilized NetMHCpan to predict their binding affinity. The analysis revealed that the proportion of strong binding peptides was lower in the unique identified peptides compared to their counterparts, although strong binders still accounted for nearly 50% (Fig. 5f). Additionally, the elution percentage rank distribution based on peptide length supported this conclusion, as both commonly and uniquely identified peptides demonstrated similar trends. Specifically, 9 a.a peptides showed the lowest percentage rank among strong binders (Fig. 5g, supplementary Fig. S3b). This pattern was also evident in the immunopeptides identified using UniProt database, providing further evidence that the majority of the unique identified peptides were immunopeptides.
Discover HepG2 specific mutant peptide using COSMIC-based database
During the evaluation of the HepG2 WES-based and COSMIC-based database for the identification of HLA-I immunopeptidome, we excluded peptides with "isoleucine" to "leucine" mutations, as these cannot be distinguished by mass spectrometry. As a result, we identified 16 mutant peptides (Table 3). Although both mutation databases include other mutant classes of mutations, such as nonsense mutations, insertions, and deletions, all of the mutant peptides identified in our study harbored SNVs instead of other mutation class. In contrast, the COSMIC-based identified 16 mutant peptides, including the one discovered using the HepG2 WES-based database. These results indicate that the COSMIC-based database is more efficient for the identification of the HLA-I immunopeptidome.
Table 3
List of mutant peptides identified by COSMIC-based database
No. | Peptide | Origin | Hyperscore | Length | Nucleic variant | Amino variant | Protein | Binding level | HLA allele | Elution rank % |
1 | RYSEYTEEF | COSMIC | 22.11 | 9 | c.1795G > A | p.A599T | MTMR6 | Strong | HLA-A*24:02 | 0.003 |
2 | RMPEAAPRV | COSMIC | 11.81 | 9 | c.215C > G | p.P72R | TP53 | Strong | HLA-A*02:01 | 0.084 |
3 | SLFDASHML | HepG2 COSMIC | 11.98 | 9 | c.242T > C | p.V81A | AMT | Strong | HLA-A*02:01 | 0.007 |
4 | TVLSSRPVV | COSMIC | 13.06 | 9 | c.1574T > C | p.I525T | ITGAL | Weak | HLA-B*51:08 | 1.487 |
5 | LSWHLPLLI | COSMIC | 18.14 | 9 | c.26G > A | p.R9H | IL2RB | Weak | HLA-C*16:02 | 1.029 |
6 | LNDLIVALS | COSMIC | 12.87 | 9 | c.1668C > A | p.F556L | NVL | None | | |
7 | KAYGSYEELAKDPN | COSMIC | 11.19 | 14 | c.197G > A | p.S66N | DHDH | None | | |
8 | DEAQNLTRD | COSMIC | 12.61 | 9 | c.2177G > A | p.G726D | DDX54 | None | | |
9 | HGELLEVNL | COSMIC | 14.11 | 9 | c.219C > A | p.D73E | PCDHA13 | None | | |
10 | LFLDAIHLT | COSMIC | 14.69 | 9 | c.2255C > T | p.P752L | BBX | None | | |
11 | DLLLVPTAGL | COSMIC | 18.93 | 10 | c.346T > G | p.Y116D | PROM2 | None | | |
12 | GTLLSGAVGSLLL | COSMIC | 19.86 | 13 | c.508A > T | p.T170S | SLC17A9 | None | | |
13 | HMLIDLHFR | COSMIC | 12.86 | 9 | c.559A > T | p.M187L | FMR1 | None | | |
14 | QVQLLQQQ | COSMIC | 12.12 | 8 | c.593A > T | p.Q198L | TFAP4 | None | | |
15 | DSNRNLDLDSIIA | COSMIC | 23.42 | 13 | c.914A > G | p.N305S | KRT79 | None | | |
16 | QVQIGTHSPP | COSMIC | 12.92 | 10 | c.2125G > A | p.A709T | PHEX | None | | |
NetMHCpan predicted binding of at least one HLA allele for 5 of the mutant peptides (Table 4). Further investigation of the mutation peptides exclusively found in HCC revealed the presence of aminomethyltransferase (AMT) p.V81A, integrin alpha-L (ITGAL) p.I525T, and interleukin-2 receptor subunit beta (IL2RB) p.R9H. Interestingly, myotubularin-related protein 6 (MTMR6) p.A599T was also identified in large intestine cancer, while cellular tumor antigen p53 (TP53) p.P72R was confirmed in multiple cancers affecting the bone, skin, meninges, and large intestine (supplementary Fig. S4a). These findings suggests that peptides derived from mutations occurring in multiple cancers could potentially serve as neoantigens, stimulating tumor cytotoxic T-cells against a variety of cancers. Next, we compared the spectral quality and intensities of mutant peptides to those of wild-type peptides in immunopeptidomes. The intensities of mutant peptides showed no significant different compared to those of normal peptides (Fig. 6a). Specifically, the intensities of mutant peptides were evenly distributed across the overall peptide (Fig. 6c), with the majority of mutant peptides falling within a linear range. However, the hyperscores of mutant peptides were significantly lower than those of normal peptides, indicating poorer spectrum quality for the mutant peptides (Fig. 6b). The distribution of hyperscores revealed that mutant peptides were mainly concentrated in the sub-average region (Fig. 6d). A similar distribution pattern was observed in HLA-I affinity of mutant peptides (Fig. 6e).To assess the spectrum quality of mutant peptides, a manual inspection was performed, which revealed a high product ion coverage, particularly at the mutant amino acid, in spectra with high hyperscores. This finding increased our confidence in the accuracy of the mutant peptides (Fig. 6f). Conversely, it was also observed that some spectra with high quality were ranked with low hyperscores (supplementary Fig. S4a, b). However, the majority of mutant peptides had lower spectrum quality than wild-type peptides. Common observations in spectra of low-quality peptides included incomplete product ion coverage and low relative intensity of product ions, which can result in low hyperscores. Furthermore, incomplete product ion coverage may lead to single or multiple amino acid mismatches, thus resulting in wild-type peptides being mistaken as mutant peptides. This discrepancy also explains the lower proportion of binders among mutant peptides compared to wild-type peptides. For further analysis, 3 mutant peptides were selected based on their affinity and satisfactory spectrum quality.
Table 4
Summaries of HLA affinity, molecular docking energy score
No. | Peptide | Type | HLA affinity (nM) | HLA Allele | HLA template | Docking energy score |
1 | SLFDASHML | Mutant | 4.93 | HLA-A0201 | 1DUZ | -232.927 |
2 | SLFDVSHML | Wild type | 4.63 | HLA-A0201 | 1DUZ | -256.297 |
3 | RYSEYTEEF | Mutant | 9.98 | HLA-A2402 | 5HGA | -260.960 |
4 | RYSEYAEEF | Wild type | 13.32 | HLA-A2402 | 5HGA | -213.461 |
We performed molecular structure prediction and peptide-protein docking modeling to confirm the binding affinity of mutant peptides. The results indicated no significant difference in the structure and hydrophobicity of the molecular surface between the mutant peptide "SLFDASHML" and the wild-type peptide "SLFDVSHML" (Fig. 6g). Similarly, the amino acid substitution resulting from the MTMR6 p.A599T mutation did not significantly modify the structure or surface hydrophobicity in "RYSEYTEEF" (Fig. 6h). The predictions from NetMHCpan indicated that these mutant peptides, which exhibited strong binding, had minimal difference in binding affinity compared to the wild-type peptides (Table 4). This could be explained by the fact that the mutation site does not align with the canonical anchor motif, which is typically involves amino acid at the P2 or PΩ position in peptides. To further examine the binding ability of mutant peptides to HLA-I molecules, we employed peptide-protein docking using HPepDock. The mutant peptide "SLFDASHML" and wild-type peptide "SLFDVSHML" were examined in complex with HLA-A0201, while the mutant peptide "RYSEYTEEF" and wild-type peptide "RYSEYAEEF" were investigated in complex with HLA-A2402(Fig. 6i, j). The results revealed that the binding energy of HLA-I molecule for the mutant peptide was comparable to that of the wild-type peptide, supporting the predicted HLA-I affinity. The mutant peptide and wild-type peptide exhibit slight differences in their position and conformation within the protein. Furthermore, the hydrogen bond between the peptide and protein has undergone alterations. This finding suggests that amino acid alterations resulting from mutations can impact the peptide's ability to bind to HLA-I molecules, indicating that mutations, even if not situated at the P2 or PΩ position, can still influence the affinity of peptides for the HLA-I molecule.