Identification of Chimeric Transcript Longest Read-derived peptides (cLRPs)
We identified several candidate chimeric transcripts (CTs) in TCGA RNA-Seq data from 160 OvCa samples applying a customized computational workflow on the Seven Bridges Cancer Genomics Cloud platform (SBgenomics; http://www.cancergenomicscloud.org). All spanning reads of each CT was consolidated to derive longest read (LR) CT sequences (ranged between 71-148 bp). To resolve the coding capabilities of CTs, we developed a customized chimeric LR peptide (cLRP) database comprising of in silicotranslation of all LR sequences in 6 reading frames(RF) along with ENSP protein sequences and randomly generated decoy peptides (Additional File 1;https://www.ensembl.org). This was appliedas a backend reference for probing ovarian tumor mass spectra datasets generated by the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC; https://proteomics.cancer.gov/data-portal; Edwards et al. 2015), using the FenyoLab xtandem pipeline on the SBGenomics Cloud (Fig.S1a). This pipeline uses the X! Tandem search engine with pre-defined search parameters (Ruggles et al.2016; Ardeljan et al. 2019). Peptide Spectral Matches (PSMs) that qualified a confidence threshold were selected from outputs of this X!Tandem-based pipeline. Corresponding iTRAQ reporter ion intensities from spectral data (3 tumor samples and 1 site-specific pooled reference (labeled either with iTRAQ [email protected]/[email protected]) were extracted for computation of relative peptide abundances from sample enrichment ratios and summation of reporter ion intensities from different scans and bRPLC fractions; selecting for FDR values <0.01 (expect values 10-02). Thence the order of magnitude of relative peptide expression range was computed as log2 transformed total peptide abundance. Examination of the charge state of PSMs in terms of b- and y- ion fragmentation profiles was performed using Proteome Discoverer™ (Thermo Scientific™).
Prediction of antigenicity
Predictions of 9-11 residue degradation products of the tumor-specific cLRPs using the PCPS Proteasomal Cleavage Prediction Server (http://imed.med.ucm.es/Tools/pcps/;Gomez-Perosanz et al.2020) were followed by deployingTAPPred server for predicting binding affinity of peptides(IC50)towards the TAP transporter(http://crdd.osdd.net/raghava/tappred/; Raghava,2004) and generating TAP Scores. MHC Class-I alleles were selected for screening from two geographical regions viz.India (South Asia) and US (North America) from the HLA Allele Frequency Net Database (http://www.allelefrequencies.net;Gonzalez-Galarza et.al, 2020; A:B:C alleles - 245:404:120) as is outlined in detail in Additional File 2).Binding of peptides to MHC Class I was predictedusingNetmhcpan 4.1 that implements Neural Network Aligning (N-N Align) and is trained on Binding Affinity (BA) Data and Eluted Ligand (EL) datasets from mass spectrometry to present antigen to MHC molecules using concurrent motif deconvolution (process of associating ligand to MHC molecule), and rank peptides based on BA as a percentile score with respect to predication of top 100 peptides; hence lower BA rank corresponds to strong MHC I binders. Haplotype-based determination of the stability and affinity of potential neoantigens was determined by NetMHCstabpan1.0(NetMHCstabpan - 1.0 - Services - DTU Health Tech)that predicts stability and affinity of a peptide towards an allele; a threshold of 0.5% combined rank stability was set to identify peptides most likely to bind to MHC molecules with T-Half (Predicted Half Life) >2h and IC50 values < 100nM.
Derivation of Allele Harmonic Binding Rank (AHBR) and Peptide Harmonic Best Rank (PHBR)
PHBR represents the inverse average of a specific peptide expressed across patient samples, and was determined as follows –
where, x = number of all peptides from the corresponding cLRP detected in mass spectra from all tumor samples, y is a score for a predicted peptide being antigenic (score = 1 if the antigenic peptide is specifically detected in mass spectrometry data; score 2 if other peptides from the same cLRP are detected).
AHBR was calculated as inverse average of B.A Rank (Binding Affinity Rank) of all the MHC subtypes restricted by particular peptides i.e., ratio of summation of B.A Rank of restricted MHCs by respective peptide and total number restricted sub-alleles (a)
Peptide MHC Docking
Peptidestructures were designed inPepFold3D (RPBS Web Portal (univ-paris-diderot.fr);model 1 structure (by convention considered most stable) was visualized in PyMol 4.2 (PyMOL | pymol.org,The PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC), and torsion angles edited using AutoDock 4.PDB structures of relevant and available HLA molecules were downloaded from IMGT (IMGT/3Dstructure-DB and IMGT/2Dstructure-DB Query page) and cleared of peptide ligands and artifacts from X-Ray crystallographic analysis using PyMol 4.2. These were submitted to CASTp [CASTp 3.0: Computed Atlas of Surface Topography of proteins (uic.edu)]for prediction of six peptide binding pockets within each molecule as below.
Table 1: MHC pocket (A-F) – associated residues derived from reported pdb structures
Grid box dimensions (size and coordinates) for peptide binding in the pdb structure of each MHC I molecule was defined in Auto dock 4 (Trott et al. 2010) applying the above pocket information as reference.The final pdbqt files were provided as the input for docking in AutodockVina, and outputs were visualized in PyMol and Discovery studio visualizer (BIOVIA, Dassault Systèmes, Discover Studio Visualizer, v188.8.131.5298 San Diego: Dassault Systèmes,2021). Successful docking outcomes presented the peptide centered within the defined grid.Ligand conformation and log file of affinity data (kCal/mol) of the first pdb structure (lowest RMSD values) from nine conformations received, was considered as the best prediction. Complexes with affinity values (-9.00 to -6.00 kCal/mol) and polar bond length <3.00 Ao, electrostatic and hydrophobic bond length <4.00 Ao were considered stable complexes and their interactions studied with reference peptide:allele (R-pMHC) complexes derived through X-ray crystallography.
Gibbs Cluster 2.0 [GibbsCluster-2.0 Server (dtu.dk)] was applied toinput test peptide sequences along with a set of validated restricting peptides for respective alleles obtained fromMHCBN: A comprehensive database of MHC binding and Non-binding peptides (osdd.net), VDJDb(cdr3.net) and published literature (Cole et.al 2006). Default MHC class I parameters were selected to identify clusters with highest Kullbach-Leiber distance (Andreatta et al. 2017)and probability scores to predict a binding motif for each allele.
Determination of Agretopicity Index andValidation of antigenicity of parental proteins
A modified agretopicity Index was derived to compare the binding affinities of antigenic epitopes with corresponding 8-11-mer amino acid sequences in the 6-frame in silicotranslated outputs of its transcript sequence.Shorter peptides generated in some cases due to the presence of stop codons in the alternative frames, were disregarded and only 9-mers were processed (NetMHCstabpan 1.0) for comparison of affinities with the same allele. Similarly, 6-frame protein isoforms of the specific parental transcripts generating the cLR of PSEN2-CABC1 transcript, viz. PSEN2-001 (ENST00000366783.3)andCABC1,ADCK3-004 (ENST00000366779.1; Ensembl Genome Browser,https://www.ensembl.org, Homo_sapiens - GRCh37)were predicted in silico, and each of the 12 generated proteins were processed through established pipeline for prediction of antigenicity as above.
Docking of p:MHC:TCR complexes
CDR3 sequences ofαandβchains with associated V and J genes of the available pMHC-TCR complexed with alleles HLA-A*11:01, HLA-A*24:02 and HLA-B*27:05 (Ladell, 2014) were derived from VDJ database (VDJdb :https://cdr3.net). This provided references of 2 TCR:viral (EBV) epitopes:HLA-A*11:01:2; 18 TCRs: human epitopes: HLA-A*24:02:4; and 1 TCR:viral (HIV) epitopes:HLA-B*27:05:4; (ImmuneScape VDJ Assembler,https://tcr2.cs.biu.ac.il/home;https://sysimm.ifrec.osaka-u.ac.jp/immune-scape/mhc1). 10 peptides restricting 3 alleles were modeled with respective complexed TCR, selecting TCR sequences where epitope of human origin was available; in case of epitopes of viral origin, TCR structures with minimum VDJdb acquisition Score of 1 were considered as structure of confidence. ERGOS II (https://biu.ac.il) was used to derive p:MHC:TCR binding scores for test peptide peptide:allele complex and CDR3 sequences ofαandβchain sequences with associated V and J genes. Autoencoder-based model along with VDJ Database as trainingset were provided as parameters. ImmuneScape VDJ chain assembler derived full length chain sequences of TCR-αβusing CDR3 and VJ genes were further provided for modeling in ImmuneScape Modeler along with epitope and HLA-Alleles to obtain PDB output, which in turn was provided as input to TCR 3D repertoire database (https://umd.edu) for derivation of Incident and docking angles and TCR-CoM coordinates. Further, PyMol and Discovery Studio were used to visualize PDB structures for comparison of polar bond lengths and interacting residues with reference PDB structures obtained in VDJdb wherever available. while another complex structure was generated for HLA-A*11:01 towards evaluation of obtained TCR-p:MHC complexes.
Comparison with pre-reported neoantigens
Pre-validated neoantigens reported in Melanoma (Ott et.al,2020) and Glioblastoma (Keskin et.al, 2019) were processed through the same pipeline deployed in the present study. For themelanomadataset, areference peptide, DELEIKAYreported to restrict HLA-B*18:01(PDB ID:4XXChttps://www.rcsb.org/structure/4XXC)was selected for simulation of docking and comparison of interactions with predicted peptides.2 TCRs, viz. TRAV19*01-J3*01: TRBV20-1*01-J2-7*01; TRAV1-2*01-J32*01:TRBV18*01-J1-4*01 reported tocomplex with HLA-B*18:01 and epitopes of human origin (EEAAGIGIL, MEVDPIGHLY; VDJ assembler) were derived to execute the second level of molecular modeling was and identify similar interactions as those in respective references. Forglioblastoma, simulation of docking was performed and compared with reported reference, RRKWRRWHL for peptides predicted to restrict HLA-B*27:05 (https://www.rcsb.org/structure/5IB1) and further for secondary docking with TCRTRAV14/DV4/J21-TRBV6-5*01/J1-1*01reportedly complexed with HLA-B*27:05 and epitopes of Viral HIV origin (KRWIILGLNK; VDJ assembler) for comparison of interactions as those of reference.
Two groups of patients were demarcated within the cohort based on their TcTP burden (lower and higher than median values of CTs and cLRPs), viz.Group 1 (n=51, low TcTP) and Group 2 (n=50, high TcTP).Kaplan Meier (K-M) plots were constructed using survival package in R and tested for significance by log-rank test (p<0.05). Differences between the K-M curves for the 2 groups were computed using survival probability as function of time and by inspecting the visual shape of plots, median overall survival (OS) and progression-free survival (PFS) in days. Between and within group analysis of variance (ANOVA) was performed for derived AHBR and PHBR values to segregate the variation of peptide distribution and allele restriction for two groups. Trends in distribution across G1 and G2 were analyzed through one-way ANOVA (Kruskal-Wallis H-test, performed using Microsoft Excel 2019 Office Analysis ToolPak), testing null hypothesis stating equal mean values of both groups which was rejected if calculated value (F-value) was observed to be greater than the critical chi-square value (F-crit) at p<0.05. Additional attributes (number of predicted antigenic peptides through NetMHCpan, number of these peptides detected in mass spectra, summation of relative abundance of cLRPs and MS detected antigenic peptides, allele restriction by NetMHCpan predicted antigenic peptides and mass spectrometry detected antigenic peptides, total AHBR-PHBR score, patient OS in days) were extracted and scaled based on their minimum to maximum values for visualization in MeV (Multiple Experiment Viewer v4.9).