Analysis of SmartSeq2 breast cancer cells
To demonstrate the utility and novelty of VDJView, we analysed scRNA-seq data (full-length transcriptome, SmartSeq2 protocol) from the primary breast tissues and metastatic lymph nodes of 11 subjects [11]. We applied VDJPuzzle [2] to the scRNA-seq data to quantify the gene expression and reconstruct the receptors of the T and B cells, inputting the results into VDJView. We found 170 single B cells with at least one full-length H, L or K chain, of which 101 had a full-length heavy and light chain. Similarly, we found 42 single T cells with at least one full-length α or β TCR chain, of which 30 had paired TRα and TRβ chains. Thus, we have uniquely identified T and B cells via their receptor, and confirmed the findings achieved by the authors of the original work by identifying immune cells through gene enrichment analysis [11]. In addition to these, we found 33 cells with TCR and BCR chains, suggesting that they were likely contaminants or doublets. While our analysis of the T cells revealed a highly diverse repertoire (Supplementary Figure 1), we identified a clone in BC03 which was present in both primary and metastatic lymph node tissues, as well as 31 B-cell clones, with clonotypes shared across primary and metastatic tissues, and across subjects (Figure 1 and Supplementary Figures 1-2, Supplementary Tables 1-2). Pseudo-time analysis on these immune cells showed that a common repertoire of B cells is involved in breast cancer with a migratory pattern between primary and metastatic tissues (Figure 1). We used VDJView to integrate immune receptor information with the gene expression profile and available metadata, and performed unsupervised clustering. The unsupervised clustering (Supplementary Figure 4) revealed evidence of 8 clusters based on identity (B and T cells), B-cell isotype, tissue of origin and cancer molecular subtype. T cells largely formed a single cluster with marker gene CD96 associated to immune modulation, as well as expression of IL2R-γ and FYB which is known to control IL-2 secretion. The remaining clusters were largely composed of B cells based on tissue of origin, molecular subtype of cancer, and notably a cluster that was composed of IgG1 B cells in metastatic lymph-node of double positive breast cancer, expressing gene signature suggesting they are highly active and differentiated B cells, e.g. plasmablast following a reactivation of memory B cells. In this cluster, the over-expression of PAX5 and TCL1A could also indicate presence of malignant immune cells as these genes are often found in leukemia and likely to contribute to oncogenesis BCL6 [13, 14]. Further analysis of this data is detailed in Supplementary Note 2.
Analysis of 10X antigen specific CD8+ T cells
To further demonstrate the utility of VDJView, we have analysed the recently published scRNA-seq data with TotalSeq and dextramer stained CD8+ T cells. This dataset contains single cell data on over 150,000 CD8+ T cells isolated from 4 healthy donors, two of which were CMV positive, 44 dextramers were simultaneously used in each subject to isolate antigen specific T cells across viral infections (CMV, EBV, HPV, Influenza, HIV), and cancer (e.g., MART, MAGE NY-ESO). We used this data to study the clonal distribution within and across specific antigens and link this information to the gene expression and other metadata.
In this analysis, we uploaded and analysed the TCR sequences and the gene expression matrices available on the 10X Genomics website. Utilising the available csv template in VDJView, we generated a third file containing the available metadata for each cell, e.g., subject ID, TotalSeq 15 surface markers including T cell differentiation markers (CD45RA, CD45RO, CCR7) and exhaustion and activation markers such as HLA-DR and PD-1, and tetramers read-counts (HLA-I restricted epitopes), MHC allele and other information. Given the large number of cells in the dataset, which can be a limitation for the standard computational resources available to the user, we used VDJView to randomly sample 15,000 cells from donor 1. We then performed quality control on the data, filtering out cells with >15% mitochondrial genes or abnormally high total expression counts, leaving 11,675 cells. After removing these obvious outliers, contaminants and poor quality cells, we filtered out cells with low tetramer read counts, or tetramer read counts that were not significantly higher than the negative control tetramers (also available in the dataset). This filtering resulted in 3,815 antigen specific T cells.
We used this set to explore the distribution of genes, markers for T cell differentiation, receptor clonotype, and tetramer specificity. Unsupervised analysis (Figure 2a) revealed 8 clusters with marker genes identifying signatures of cytotoxic activities of CMV, EBV and Influenza specific CD8+ T cells, and the presence of memory and naïve T cells (e.g., CCR7+ CD45RO+ and CCR7+ CD45RA+), thus, revealing clustering based on epitope specificity, T-cell differentiation and TCR specificity. Specifically, clusters 1 and 4 showed clonally expanded populations of EBV specific memory cells identified by marker genes being TCR V genes and by CDR3 specificity. Cluster 2 revealed influenza specific memory cells, expressing TRBV19, known to code for a public TCR specific to the highly-conserved M158-66 immunodominant epitope [15]. Clusters 3,5, and 6 mostly revealed CMV-specific cells displaying no obvious clonality. These three CMV-specific clusters revealed heterogeneous expression of Granzyme H and B genes, and of transcription factors LEF1, TCF7, and ZNF683 (Hobit), which are regulators of T-cell differentiation. The cells in cluster 6 expressed CCR7 and were clearly naïve (CD45RO- CD45RA+). Finally, cluster 7 formed CMV and EBV specific and clonally expanded memory T cells, revealed by the same TCR CDR3 sequence. Notably, despite the filtering of low quality cells, cluster 8 revealed cells with reduced expression of all marker genes, including housekeeping genes RPL7 and RPL27, and with the highest percentage of mitochondrial genes, thus reinforcing the importance of quality control steps in scRNA-seq analysis.
We then utilised the dimensionality reduction features of VDJView to further explore clonality within these subsets. We used the t-SNE plots (Figure 2b) generated utilising the gene expression profiles to explore protein and tetramer expression, as well as other metadata information. As expected, the clusters identified via SC3 largely formed distinct clusters, with EBV and influenza specific T cells revealing the highest tetramer read counts, suggesting a high binding affinity of these cells for the cognate antigens. Within the CMV and EBV specific T cells, clonally expanded T cells formed larger clusters, suggesting a common gene signature in clonally expanded populations. By marking the expression of genes such as GZMH, LEF1, TCF7, CMC1 and CCR7 gene expression, the t-SNE plots revealed sub-clusters based on the differentiation status of T cells. Finally, we performed pseudo-time analysis (Figure 2c) to reveal a naïve to effector phenotype transition, shown by the increase in CD45RO expression, which is inversely mirrored in CD45RA expression. This analysis showed that naïve T cells identified in cluster 6 in the SC3 analysis formed a separate branch, while memory T cells were distributed across the pseudo-time structure.
We also analysed the TCRs of all T cells from donors 1 and 2. After performing the same quality control and filtering as outlined above, we were left with 55,922 antigen specific T cells (14,199 from donor 1 and 41,723 from donor 2). Both donors displayed clonally expanded populations (Figure 3), with 3 unique TCR expanded across at least 1,000 cells, and over 16 expanded across at least 100 cells. Both donors displayed VDJ gene usage bias, with a relatively high usage of TRBV19 common to both donors. We identified a total of 15,600 unique TCRs, with 411 TCRs common in both donors (Table 2 shows 15 of these). We also found evidence of cross reactive TCR that target different antigens within the same species, or across species, opening further avenues of study.