New peptides generated by gene mutation don’t have immunogenic potential entirely. Only a small fraction of mutations are processed into peptides, presented on the surface of tumor cells by MHC class I (MHC-I) and recognized by T cells as neoepitopes as foreign by the immune system[6, 7]. CD8 T cells recognize antigens presented by MHC-I[6] or HLA by tumor cells. Thus, to find the neoepitopes, DEGs between tumor cells and normal cells must be analyzed, in parallel, MHC class should be detected and the affinity of the TCR for the peptide/MHC-I complex should be measured subsequently. There was reports on neoantigens as early as 1994, scientists indicated that a mutated cyclin-dependent kinase 4 (CDK4) was a tumor-specific antigen recognized by HLA-A2.1-restricted autologous cytolytic T lymphocytes in a human melanoma[8]. But comprehensive identification of neoantigens has been difficult until the advent of next generation sequencing. Since the first appearance of high-throughput sequencing in 2005[9], it has become possible to understand life activities at the molecular level and to conduct detailed research to elucidate the genome and transcriptome[10]. The frst-generation sequencing technology (Sanger sequencing) used a DNA polymerase to extend the primer bound to the pending sequence template until a chain terminating nucleotide was incorporated. High-throughput sequencing is a revolutionary change to traditional Sanger sequencing. It can sequence hundreds of thousands to millions of nucleic acid molecules at one time. The article calls it second-generation sequencing, which shows its epoch-making change. In addition to NGS, there is third-generation sequencing(TGS), which allows for long-read sequencing of individual DNA molecules[11].
RNA sequencing is a novel potent deep-sequencing method to transcriptome profiling that has been widely used for the detection of gene alterations, gene fusions and somatic mutations[12]. When it is applicated in gene mutation detection, although it is not as accurate as DNA sequencing, it may provide more information than DNA sequencing. It is known to all that gene mutations mainly including SNP or SNV, INDEL can affect protein function and lead to tumor development. Protein alterations produced by mutations can also be used as tumor antigens and used as targets for tumor immunotherapy. With the development of NGS and other technologies, the analysis of DNA sequencing or RNA sequencing can find more gene mutations. On the one hand, it is helpful to find out the possible causes of tumors, on the other hand, it can be analyzed from the perspective of immunology to find out neoantigens used for tumor immunotherapy.
BRAFV600E mutation and RAS mutation are common gene mutations in PTC. In TCGA data, BRAF mutation accounted for 61.7% (248/402) (almost all BRAFV600E mutations), and RAS mutation accounted for 12.9% (52/402)[13]. Liang et al research aimed to the Chinese population found that BRAF mutation accounted for 72.4% (257/355), and almost all of them were BRAFV600E mutation, while RAS mutation accounted for 2.8% (10/355)[14]. The result obtained by VarScan2 in our research is similar to the above with the BRAF mutation account for the majority of mutations (50%, 5/10), while RAS mutations are relatively rare (10%, 1/10). In our study, BRAF mutation and RAS mutation appeared mutually exclusive in each case, which is the same as many studies including TCGA data and Liang's research. There were 9 SNVs and 1 INDEL with the highest frequency, seen in 5 patients, accounting for 5/10. Although these somatic mutations SNV and INEDL did not appear in the TCGA data except for BRAF mutations, these frequency high mutations may be related to the occurrence and development of PTC.
Gene fusion also plays an important role in the occurrence of many tumors including thyroid cancer[15], and can be used for tumor diagnosis and as a target for tumor immunotherapy[16]. A promising application of fusion genes is immunotherapy. For example, drugs targeting ALK fusion gene products and ROS1 fusion gene products are in clinical application or clinical trials. RET fusion, BRAF fusion, NTRK1/2/3 fusion are also considered to be promising targets for targeted therapy[17]. We used SOAPfuse software and EricScript software to predict fusion genes. A total of 31 fusion genes were identical, of which 25 fusion genes appeared only once. There are 12 fusion genes that only appear in cancer tissues. The frequency of EPS8L2/TALDO1, which only appears in cancer tissues, is 3/10. The two genes DUOXA1 and DUOX2 are related to the synthesis of thyroid hormone. Therefore, it can be speculated that the EPS8L2/TALDO1, DUOXA1/DUOX2 fusion gene may be related to the progressing of PTC.
The HLA gene complex is the human MHC and the most polymorphic gene system in human body[18]. Recent studies have verified that HLA genotypes are related to the occurrence of thyroid diseases. For example, HLA class I is related to the occurrence of Graves' disease, and HLA-B*51:01 among Han nationality in coastal areas of Shandong Province is related to the occurrence of PTC[19, 20]. It implied that the detection of HLA subtypes is very important no matter in perspective of immunology but also the occurrence of thyroid diseases. Currently, common detective methods with DNA-based are sequence specific oligonucleotide(SSO), PCR-SSP, and PCR-sequence-basedtyping(SBT). The PCR-SSP method was used in our study, which is to design a series of specific primers to amplify certain HLA genotype DNA fragments, and then perform gel electrophoresis to determine the allele subtype based on the presence or absence of electrophoretic bands. However, this method is prone to produce ambiguous results, and as new subunits or accuracy increasing, a large number of sequence-specific primers are required. The NGS technology used in HLA genotypes detection has obvious advantages in the amount of information and the resolution of HLA genotypes. It can discover new HLA alleles. In our study, results of the two methods in the 2-digit accuracy, except for 3 cases on HLA-B and 1 case on HLA-C (4/30 in total) were different in PCR-SSP method and HLA profiler, the remaining HLA-A/B/C subunits were exactly the same, and the consistency rate between PCR-SSP and HLA profiler was 86.67%. HLAprofiler is better than PCR-SSP in accuracy and comprehensiveness. Another advantage of HLAprofiler is that it can analyze the HLA gene subtypes of cancer tissue and thyroid tissue separately. If the HLA subtypes of cancer tissue and thyroid tissue are different, there is the possibility of HLA subtype variation, and this may be a mechanism for the cancer to evade the body's immunity[21].
A major function of the immune system is to detect threat from foreign invaders, tissue damage, or cancer and to mount a counter response that resolves the threat, restores homeostasis, and supplies immunological memory to prevent a second assault[22]. The purpose of tumor immunotherapy is to use the body's immune system to fight the corresponding cancer cells without harming the body's normal tissues or organs. It is a promising method for the treatment of cancers. In recent years, the use of immune checkpoint cytotoxic T lymphocyte associated antigen-4(CTLA-4) and programmed cell death protein 1 (PD1) to treat carcinoma like melanoma has gained good results[23, 24]. The activation of the adaptive immune system requires the presence of a suitable immunogen. Tumor-specific mutations can generate neoantigens if peptides containing the mutation are presented on the surface of tumor cells by MHC-I molecules and are recognized by T cells as neoepitopes[7], which are absent in normal tissues. Tumor immunotherapy recognizing these antigens are more targeted and relatively safer. Intracelluar polypeptides need to bind to HLA and present to the cell surface to be recognized by immune cells. The polymorphism of HLA determines the individual's immune response and susceptibility to diseases including cancer[25], and the molecular changes that cause malignant transformation of cells are different between different tumors. The polymorphism of HLA causes the analysis of HLA allele genotype become extremely complicated. With the development of bioinformatics, the relative algorithms or software appear and update constantly aimed to solve the complex situations mentioned above, such as predicting gene variation, analyzing HLA alleles, predicting the affinity of MHC and peptides, analyzing MHC-peptide complexes. The pVACtools and INTEGRATE-neo used in our study are examples of integrating the above algorithms or software to improve the efficiency of antigen prediction. These computational methods predict neoantigens based on the predicted binding affinity of the mutated peptides to MHC-I molecules[7].These bioinformatics also provide firm support for more accurate treatment.
Potential neoantigens or epitopes can be filtered take advantage of MHC and peptide binding affinity prediction algorithm. Trolle et al used 44 datasets covering 17 MHC alleles and more than 4000 peptide-MHC complex to predict affinity. Article showed that of the four participating servers, NetMHCpan performed the best, followed by ANN, SMM and finally ARB[26]. In our study, we used netMHCpan to predict the affinity of MHC class I molecules and peptides, and then to select potential new antigens. The epitopes filtered from somatic mutation can be present in up to 3 patients (3/10) at the same time, and can bind up to 5 HLA I alleles. The 9 peptide epitopes screened out by the fusion gene generally exist in 1 patient (1/10), and can bind up to 2 HLA I alleles. This result hints that there may be some epitopes containing variant sites exist in some patients simultaneously and widely and are also with good affinity to several HLA molecules. This needs further experimental confirmation.
We should note the limitations in our study that all the gene alterations and the predicted neoantigen epitopes need to be further corroborated by substantial experiments. However, the study is of great value in identifying potential novel epitopes and biomakers in PTC immunotherapy.