Identification of neoantigens or tumor-specific antigens was described at the end of the last century (Emens et al., 2017). DNA sequencing technologies and bioinformatic protocols have triggered the understanding of the interactions of tumor cells with the immune system, including the design of cancer vaccines. This strategy is not suitable for all cancer types due to a relatively high mutation rate is expected to generate new antigens. In colorectal cancer, melanoma, and other tumor types, a high number of neoantigens is associated with patient response to immune therapies, making them suitable for cancer vaccines.
In this context, to detect possible variants with the potential to induce tumor-specific peptides in samples of colorectal tumors, a bioinformatic protocol was implemented with samples of human colorectal cancer for chromosome 1. We identified 395 SNVs shared by all tumor samples (and absent in the normal mucosa sample). See the Supplementary file for the whole list of SNVs.
Despite this, only 28 non-silent SNPs in exonic regions were identified (see Supplementary file for details). They were selected to predict neoantigens. These variants were distributed in 26 genes, most of them related to cell cycle control. Interestingly, multiple of these genes, including NOTCH, OBSCN, HSPG2, and NBPF, have been demonstrated to be expressed in colorectal cancer (Andries et al., 2015; Liao et al., 2018; Shriver et al., 2015). Further analyses are required to assess the role of these specific genes in the development of the disease and the immune response, including studies of driving genes (important for the tumor survival), transcriptomic/proteomic profiling, epigenetic mechanisms, or other genomic variants.
Regarding the SNVs identification, we used VarScan2 to call variants (mutations). This software has been shown to have a high sensitivity (81–100%) in the variant calling analysis, in particular with cancer data (Spencer, Tyagi, et al., 2014; Xu, 2018). In addition, the use of a normal tissue sample to filter variants has been advised by several studies to reduce false positives (such as germline mutations) (Bae, Kim, & Kang, 2016; Y. Wang et al., 2018), as we did here. Confirmation strategies are required to validate identified variants (for example using Sanger technology to re-sequence genomic regions). Also, other approaches to call variants can be used to assess other DNA data, to deal with complexity, other sequencing technologies, and biological variability (Z. K. Liu, Shang, Chen, & Bian, 2017).
On the other hand, for HLA of the Costa Rican Central Valley population, information only had a two-digits resolution. Thus, we used the HLA supertypes (A*01, A*02, A*03, A*24 and B*07) or the representative allele of the group to predict interactions (four-digits resolution as required). The length of tumor-specific peptides was considered between 8–11 amino acids, which is the known peptide size that can be presented by HLA-I. It has been suggested that simultaneous testing of peptides between 8 and 11 amino acid residues is advisable in peptide prediction for putative cancer vaccines, due to the "core" or the common sequence (8 amino acids) may have greater versatility in different fragments (Luo et al., 2015). This allowed us to test several possible peptides of the same variant and to improve the search for possible neoantigens.
Considering the above, 23 candidate peptides were found to have affinity against the most prevalent Costa Rican Central Valley HLA I alleles (Table 1). Some peptides were identified to be presented by different alleles of the HLA, such as AYLHSPMYF (two different loci HLA A*2402 and HLA * C0401), however, most peptides are predicted to be presented in only one allele.
Regarding the number of peptides and haplotypes, the HLA-A locus was the most significant, contributing to most of the high-affinity peptides for the antigenic presentation (Fig. 3 and Table 1). For example, up to nine peptides can be presented in the case of haplotype A*2402/B*3501, in which the first allele is responsible for the presentation of nine peptides.
We also analyzed the physicochemical nature of the peptides (Table 1). Many of them are acidic peptides and, based on the hydrophobicity index, most of them have a slight hydrophobicity. Some “core” peptides are found in several peptides, such as AYLHSPMY which is present as part of seven candidate peptides of different HLA alleles (HLA-A*24:02, B*35:01, and C*04:01).
In the context of the selection of peptides, the algorithms use metrics such as IC50 value (concentration necessary to displace 50% of the classical peptide bound to an HLA allele) (Giannakis et al., 2016; Karasaki et al., 2017). However, it has been shown that it can yield a greater number of false positives in the detection of neopeptides, since some alleles can bind a significant percentage of random peptides (Nielsen & Andreatta, 2016). This motivated the present investigation to use only peptides with strong affinities (rank < 0.5%).
Other biologic factors should be considered to identify possible neoantigens and not included in our study. For example, protein degradation is possible at the endoplasmic reticulum (Fleri et al., 2017). In addition, peptides affinity must be validated using in vitro assays with antigen-presenting cells of the Costa Rican population, for example, the Enzyme-linked Immunospot Assay (ELISPOT) (Kato et al., 2018).