Molecular imprint of epithelial origin identies immuno-subtype of thymic epithelial tumors

Thymic epithelial tumors (TETs) are common tumors in human anterior mediastinum with limited biological understanding. Through decoding the immune landscape of tumors, we reclassify TETs into three types based on T cell developmental patterns. We uncover the developmental dysfunction and TCR repertoire of tumor-inltrating T cells by cell atlas. Moreover, we identify the unique subset of tumor cells with distinct epithelial origin in each TETs type. Furthermore, we demonstrate that KRT14/GNB3+ mTECs-like cell accumulation inhibits the T cell positive selection in type 1 TETs, while CCL25+ cTEC-like cell promotes T cell positive selection in type 2. Interestingly, although CHI3L1+ mTEC-like cell in type 3 TETs loses the function of supporting T cell development, it acquires the capacity to induce CD8+TRMs-mediated response. Finally, we propose a new molecular classication of human TETs using GNB3 and CHI3L1 to distinguish the epithelial origin of tumor cells, which is promising in prognostic prediction. development in thymus to establish central self-tolerance 11 12 . Clinical studies showed that patients with type B thymoma, which may be derived from mTECs, have higher incidence of autoimmune diseases than patients with type A and type C thymoma 13 14 . These evidence suggests that the tumor cells of human TETs originated from different subset of TECs may impact the T cell development in tumor differently.


Introduction
Thymic epithelial tumors (TETs), including thymomas and thymic carcinomas, are common primary tumors in human anterior mediastinum 1 2 . Owing to lack of cell lines and animal models [3][4][5][6] , the cognition regarding the etiology and biology of TETs is far from adequate. All types of TETs are considered potentially malignant in the latest clinical guideline 7 because previously de ned benign tumors also exhibit aggressive malignant behavior clinically 8 . These evidence suggests that human heterogeneous TETs needs to be reclassi ed to accurately re ect the biological characteristics of tumors.
Generally, TETs are considered to be transformed from thymic epithelial cells (TECs) which play a crucial role in the T cell development of mammal 9 . There are two major subtypes of TECs that may be the epithelial origin of human TET, cortical TECs (cTECs) and medullary TECs (mTECs), which mainly located in cortex or medulla structure of thymus respectively 10 . It is well demonstrated that cTECs are essential for the positive selection of T cells, while mTECs are mainly involved in the negative selection during T cell development in thymus to establish central self-tolerance 11 12 . Clinical studies showed that patients with type B thymoma, which may be derived from mTECs, have higher incidence of autoimmune diseases than patients with type A and type C thymoma 13 14 . These evidence suggests that the tumor cells of human TETs originated from different subset of TECs may impact the T cell development in tumor differently.
WHO classi cation is the most common used histological classi cation of human TETs 15 16 , however its prognostic predictive value and clinical repeatability remain unsatisfactory. Another clinically used Masaoka staging system is also limited in the prognostic prediction, especially in early TETs 17 . In addition, the WHO classi cation and Masaoka staging assessment of TETs mainly depends on the experience of pathologists and surgeons to examine the tumor morphology, which unavoidably lead to certain subjectivity 18 . Recently, emerging molecular classi cations provided insight in strati cation of prognosis and treatment for patients with TETs at the molecular level 19 20 . We proposed a hypothesis that the cellular heterogeneity of human TETs which derived from distinct epithelial origin, for instance cTECs or mTECs, resulted in the discrepancy in the formation of immune landscape within each tumor type. Therefore, we conducted a comprehensive study using mass cytometry, single cell sequencing, TCR repertoire, histological analysis, FCM detection and immuno uorescence testing to decode the epithelial origin and its impact on immune composition of human TETs. We successfully reclassi ed human TETs into three subtypes according to the immune landscape of tumor, pattern of T cell development in tumor, TCR repertoire and origin of tumor cells. In addition, we uncovered the mechanisms that led to the discrepancy of T cell development in each tumor type. Based on our ndings, we proposed a tripartite model of tumorigenesis and internal tumorous T cell developmental dysfunction, which was orchestrated by the epithelial origin of human TETs. Finally, according to the epithelial origin of tumors, we established a new molecular classi cation of human TETs which had shown advantage in prognostic prediction.

Results
Immune landscape reclassi es human TETs.
It is reported that lymphocyte in ltration level in tumor was correlated with the histological classi cation of TETs for decades 13 . However, details about the immune landscape of human TETs need to be decoded. To this end, we used mass cytometry, single-cell RNA sequencing, TCR pro ling, FCM, HE and IF staining to investigate the immune landscape and its underlying mechanism of human TETs (Fig. 1A). Firstly, we established a panel of more than 40 markers to uncover the immune landscape of human thymus and TETs by CyTOF (Supplementary Table S1). Three normal thymus and twelve TET tissues were performed mass spectrometric ow tests (Supplementary Table S2). Single cell analysis showed that the cell composition of TETs was not only obviously different from normal thymus tissues, but also had a signi cant internal difference among tumors (Fig. 1B). Importantly, we found the tumor microenvironment of TETs, de ned by cellular composition, can be further classi ed into three types (type1, 2, 3) (Fig. 1B). Furthermore, each type of human TETs showed a unique histological morphology and lymphocytes in ltrating pattern (Fig. 1C).
Next, we annotated the composition of immune cells in TETs and normal thymus ( Fig. 1D and 1E). The immune cell compartment of TET samples similar to normal thymus comprised major immune lineages ( Fig. 1D and Supplementary Fig. S1A), among which CD3 − CD4 + CD8 + and CD3 + T cells were most abundant ( Supplementary Fig. S1B). Through inter-group comparisons among normal thymus tissue and each type of TETs, we found obvious differences in the major cellular components between groups, especially in CD3 − CD4 + CD8 + and CD3 + T lymphocytes (Fig. 1F, 1G, and Supplementary Fig. S1C). The proportion of CD3 − CD4 + CD8 + lymphocytes was highest in type 1 and lowest in type 3 (Fig. 1G). Moreover, from type 1 to type 3, there was an obvious decreasing tendency of CD3 − CD4 + CD8 + lymphocytes in tumor (Fig. 1G). In contrast, the proportion of CD3 + T cells, B cells, and NK cells in tumor was lowest in type 1 and highest in type 3, which had a gradual increasing tendency among three types ( Fig. 1G and Supplementary Fig. S1C). Monocytes and granulocytes were more abundant in tumor of type 3, and the proportions of DCs in tissues of both type 2 and type 3 were more abundant than normal thymus and type 1 (Fig. 1G and Supplementary Fig. S1C). It is interesting that except CD3 − CD4 + CD8 + lymphocytes, other immune cells which were related to peripheral immune response in tumor were more abundant in type 3 TETs than that in type 1 and type 2 ( Fig. 1G and Supplementary Fig. S1C). In addition, the discrepant abundance of CD3 − CD4 + CD8 + lymphocytes and CD3 + T cells in tumors among three types of TETs were further validated by ow cytometry (Fig. 1H). Our results showed that the immune landscape of tumor dramatically changed during the development of TETs. Moreover, these ndings proposed that human TETs could be reclassi ed according to the change of immune landscape in tumor.
T cell developmental pattern of TETs.
The discrepant amount of CD3 − CD4 + CD8 + and CD3 + T lymphocytes among tumors suggested that there was a discriminatory T cell developmental pattern in each TETs type. In order to further investigate the change of T cell development in TETs, we used well accepted markers, which were highly correlated with T cell development in thymus 21 , to annotate the cell subsets of CD3 + T cells in tumor and normal thymus ( Fig. 2A, Supplementary Fig. S2A and S2B). Major subsets of T cells involved in thymic T cell development were represented in the dataset 12 , including double negative (DN, CD4 − CD8 − ), double positive (DP, CD4 + CD8 + ), CD4 + single positive (CD4 + SP), CD8 + single positive (CD8 + SP), and FOXP3 + regulatory T (Treg) cells ( Fig. 2A). Phenotypic analysis showed that T cell subsets in tissues of thymus and tumors which we de ned were representative (Fig. 2B). We found the T cell composition in tumor of TETs was observably different from that in normal thymus ( Fig. 2C and 2D). Importantly, the T cell compositions in tumor were signi cantly different among TETs types ( Fig. 2C and 2D). Among tumor types, immature DP cells dominated in type 1, while mature SP cells dominated in the tumor of type 2 and type 3 ( Fig. 2D and 2E). Further analysis showed that the proportion of DN and DP cells which were at the early stage of T cell development in tumor was the highest in the tumor of type 1 and lowest in type 3 ( Fig. 2E and Supplementary Fig. S2C). It was interesting that, the proportion of DN and DP cells in tumor of TETs, especially DP cells, showed a tendency to decrease from type 1 to type 3 ( Fig. 2E and Supplementary Fig. S2C). On the contrary, the proportion of CD4 + and CD8 + T cells which were at the late stage of T cell development was the lowest in tumor of type 1 and highest in type 3 (Fig. 2E). Moreover, the proportion of CD4 + and CD8 + T cells in tumor of TETs showed a gradual increasing trend from type 1 to type 3 which was contrary to DN and DP cells (Fig. 2E). Flow cytometry detection further con rmed the discriminatory amount of DN, DP and SP cells in each TETs type (Fig. 2F). Furthermore, immuno uorescence staining showed that DP and SP cells were located in the cortex or medulla like structure in tumor of type 2 TETs respectively. However, the typical cortex and medulla like structures were not observed and DP and SP cells were distributed in the stroma of tumor in type 1 and type 3 TETs disorderly (Fig. 2G).
The proportion of naïve SP cell (naïve CD4 + and naïve CD8 + cell) and memory SP cell (memory CD4 + and memory CD8 + cell) which have been fully-developed was lowest in type 1 and highest in type 2 TETs ( Fig. 2E and Supplementary Fig. S2C). Interestingly, the tissue resident CD4 + T (CD4 + T RM ) cells, tissue resident CD8 + T (CD8 + T RM ) cells and Treg cells, which generally accumulated in peripheral tissues 22 , showed a higher tendency in the tumor of type 3 TETs than the other tumor types (Supplementary Fig.   S2C). Besides, there is no signi cant bias in the ratio of CD4 + and CD8 + T cells among tumor types ( Supplementary Fig. S2C). Taken together, these ndings uncovered the unique pattern of T cell development in each type of tumor which further supported our reclassi cation and suggested that T cell developmental dysfunction occurred in the tumors of human TETs.

Dysfunction of T cell development in TETs.
To further uncover the T cell developmental dysfunction in tumor of human TETs, we performed scRNAseq and TCRαβ pro ling on six tumor samples which had been detected the immune landscape by mass cytometry using the droplet-based 10x Genomics platform (Fig. 1A). The data of a normal thymus sample published by Jong-Eun Park et al. was also reanalyzed as normal control 21 . After quality control including doublet removal, a total of 52,788 cells from tumors and 2,845 cells from normal thymus were included in our study (Supplementary Table S3 Table  S4). Consistent with the results of CyTOF, cellular atlas showed that T cells dominated in the discrepancy of cell types between tumor and normal tissue as well as among tumor types of TETs (Supplementary Fig. S3F and S3G).
In order to further investigate the T cell developmental dysfunction in TETs, T cells were re-clustered into 18 cell clusters ( Supplementary Fig. S4A). These cell clusters were further annotated (Fig. 3A), according to the marker genes related to T cell development in thymus (Supplementary Fig. S4B and 3B). We de ned the major lymphocyte subsets involved in thymic T cell development, including DN, DP, αβT Previous CyTOF results showed there were two groups (CD3 + and CD3 − ) of DP cells in tumor of TETs ( Fig. 1D and 2A). However, solely based on the transcriptional level of CD3 expression we were unable to de ne the CD3 − subset of DP cells. To solve this divergence, we further annotated DP cells into proliferating (P) and quiescent (Q) subsets according to the expression level of MKI67 and CDK1 as previous study 21 (Supplementary Fig. S4C and S4D). Consistently, the marker genes of CD3 − DP cells, which were obtained from the protein marker-based sorting strategy in another study 23 , were obviously highly expressed in DP (P) cells ( Supplementary Fig. S4E). These results suggested a close match between the DP (P) cells de ned by scRNA-seq and CD3 − DP cells detected by CyTOF in our study.
Previous studies reported that thymic T cell development started from CD4 − CD8 − DN cells, which gradually expressed CD4 and CD8 to become CD4 + CD8 + DP cells, and then transitioned through a CCR9 high αβT (entry) stage to diverge into mature CD4 + or CD8 + SP cells 21 . Inter-group comparative analysis in our study showed that the composition of T cell subsets involved in T cell developmental stage was signi cantly different among tumor types of TETs ( Fig. 3C-3E). Speci cally, DP cells, which were in the early developmental stage, constituted the largest T cell subset in type 1, but least in type 3 ( Fig. 3D and 3E). In contrast, SP cells, which were in the late stage of T cell development, constituted the largest T cell subset in type 3, but least in type 1 ( Fig. 3D and 3E). Compared with normal thymus, the proportion of DP cells increased in type 1 TETs but decreased in type 2 and almost absent in type 3 ( Fig. 3D and 3E). In contrast, SP cells were signi cantly increased in type 2 and type 3 TETs ( Fig. 3D and 3E). It was interesting that naïve T cells were more abundant in the tumor of in type 2 than type 3 TETs, whereas memory T cells were more abundant in the tumor of in type 3 than type 2 TETs ( Fig. 3D and 3E).
These ndings indicated that T cell developmental dysfunction in the tumor of each TETs type was different.
To further decode the details of developmental dysfunction of T cells in tumors, we performed trajectory analysis of the T cell subpopulations de ned previously (Fig. 3F). Consistent with previous ndings, trajectory analysis showed an obvious T cell developmental dysfunction in each type of TETs compared with normal thymus (Fig. 3G). Through inter-group comparison, we found T cells in type 1 TETs were enriched in the early stage of development, while T cells in type 2 TETs were abundant in both early stage and late stage of development (Fig. 3G). Interestingly, almost all T cells were concentrated in the late stage of development in type 3 TETs (Fig. 3G). The expression levels of Notch and Wnt signaling pathways related genes, which regulated the early stages of T cell development 24 25 , were higher in the T cells isolated from tumor of type 1 TETs than type 2 and type 3 TETs ( Fig. 3H and 3I). In contrast, IL-7 signaling related molecules, which could drive intrathymic expansion of positively selected thymocytes prior to their export to the peripheral T cell pool 26 , were expressed higher in the T cells isolated from type 2 TETs than thymus and other tumor types (Fig. 3J). These results further demonstrated that T cell development was restrained in early stage in type1, increased in type 2 and almost absent in type 3 of human TETs.

Bias of TCR repertoire and diversity in TETs.
To investigate TCR repertoire and clonotype of T cells in tumor of TETs, TCR chains detected from the TCR-enriched 5′ sequencing libraries were ltered for full length recombinants and were associated with the cell type annotation. To analyze the usage and pairing of VJ genes in T cells, we ranked VJ genes according to genomic positions as previous study 21 . For TCRβ, we found an obvious bias in VJ gene usage among tumor types of TETs (Fig. 4A), which resembled VJ gene usage of DP and SP cells respectively ( Supplementary Fig. S5A). Consistently, VJ pairs of T cells were also signi cantly different among tumor types of TETs (Fig. 4B), and consistent with DP and SP respectively ( Supplementary Fig.   S5B). As for TCRα locus, we found a clear association between developmental timing and V-J pairing, as previously described 21 27 . During the T cell developmental stage, recombination of proximal pairs on TCRα was mainly observed in DP stage, whereas distal pair recombination was mainly observed in SP stage ( Supplementary Fig. S5C). This bias in recombination in turn restricted the V-J pairing of TCRα in DP and SP cells ( Supplementary Fig. S5D). Consistently, we observed that the VJ gene usage and pairing of TCRα in each tumor type were in a similar pattern as DP and SP cells respectively ( Fig. 4C and 4D). Furthermore, we found the degree of expansion observed among the clonotypes, which were the number of cells sharing the same clonotype in the dataset, was strongly associated with the T cell developmental states de ned by scRNA-Seq (Fig. 4E). It was worth noting that clonotype ampli cation was mainly observed in mature SP cells, but less in immature DN and DP cells (Fig. 4E). This also indicated that immature T cells (DN and DP cell) had more abundant clonal diversity than mature T cells (SP cell), which was consistent with the nding of the least clonal diversity and the most SP cells in type 3 TETs (Fig. 4F). Together, these ndings uncovered the consequence in TCR repertoire and diversity of T developmental dysfunction in each tumor type.
Epithelial origin decodes T cell developmental dysfunction in TETs.
Thymic T cell development was orchestrated by heterogeneous TECs 10 Table S6). Based on this de nition, we found that CTSL, encoding the proteases cathepsin L1, and PRSS16, encoding thymus-speci c serine protease which was reported to play an important role in thymic T cell positive selection 28 , were mainly expressed in cTEC-like cells and mcTEClike cells in TETs (Supplementary Fig. S6E). The genes, FEZF and AIRE, responsible for thymic T cell negative selection were mainly expressed in CHI3L1 + mTEC-like cells, MYOG + TEC-like cells, GNB3 + mTEC-like cells and CHGA + TEC-like cells (Supplementary Fig. S6E). Inter-group comparison showed the composition of tumor cell subsets in TETs subtypes was signi cantly different ( Fig. 5E and 5G). Besides, our results revealed the major tumor cell subsets in each type of TETs, such as KRT14/GNB3 + mTEC-like cells in type 1, CCL25 + cTEC-like cells in type 2 and CHI3L1 + mTEC-like cells in type 3 ( Fig. 5E and 5G). The main tumor cell subsets in each TETs type were further con rmed by immuno uorescence (Fig. 5H). Interestingly, we found the mcTEC-like cells presented in both type 1 and type 2 tumors had more molecular characteristics of thymic epithelial progenitor cells (TEPCs) compared to KRT14 + mTEC-like and CCL25 + cTEC-like cells 28 (Supplementary Fig. S6F), indicating KRT14 + mTEC-like and CCL25 + cTEClike cells might be differentiated from mcTEC-like cells. Through pathway analysis, we found that NF-κB signaling pathway of mcTEC-like cells in type 1 was obviously activated, while that in type 2 was inhibited ( Supplementary Fig. S6G). This meant the differentiation of mcTEC-like cells in type 1 was more inclined to be mTEC-like cells, while the differentiation of mcTEC-like cells in type 2 was more inclined to be cTEC-like cells, which was consistent with the result that mTEC-like cells increased in type 1 and cTEClike cells increased in type 2 28 (Fig. 5G). Moreover, immuno uorescence staining result showed the AIRE expression in type 2 TETs was signi cantly less than the other two types ( Supplementary Fig. S6H), which was correlated with the incidence of autoimmune disease in TETs 13 .
CXCL12 and CCL25, the key cytokines promoting homing of blood-borne lymphoid progenitor cells into thymus 28 , were almost not expressed in CHI3L1 + mTEC-like cells (Fig. 5I), the main component of the tumor cells of type 3 TETs. This observation suggested the tumor cells of type 3 TETs lost the function for recruitment of thymus progenitor cells, resulting in the T cell developmental dysfunction in initial period. Having identi ed the major TEC-like tumor cells in TETs subtypes, we used CellPhoneDB analysis to investigate the interactions between TECs-like tumor cells and T cells as previous study 29 . T cell development in thymus was a complex process involving TEC-lymphocyte cell interactions, lymphocyte cell migration and lymphocyte cell localization 30 Table S7). Interestingly, cTEC-like tumor cells also had a crucial role in inducing SP cell migration through CCL19:CCR7 interaction (Fig. 5J), which differed from the cTEC in normal thymus 31 .
To further illustrate the differences of T cell development among TETs subtypes in molecular level, we used Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis to discover the difference of signaling pathway enrichment in DP cells among tumor types. Compared with type 1 TETs, the genes highexpressed in DP cells of type 2 were signi cantly enriched in signaling pathways related to T cell expansion and positive selection, including Oxidative phosphorylation, NF-κB signaling pathway, Th17 cell differentiation, Th1 and Th2 cell differentiation and TNF signaling pathway 28 32 (Fig. 5K). In the process of positive selection, most cells would be eliminated by apoptosis, so the enrichment of highlyexpressed genes in apoptosis-related pathways also indicated that more DP cells in type 2 were undergoing positive selection (Fig. 5K). Compared with normal thymus tissue, genes high-expressed in DP cells of type 2 were also enriched in the relevant signal pathways for positive selection ( Supplementary Fig. S6J). Moreover, we analyzed the expression of marker genes which indicated the pre-or ongoing state for positive selection of DP cells 32 . We found that DP cells in type 1 TETs mainly expressed pre-selection related genes, while DP cells in type 2 TETs highly expressed genes related to ongoing state of positive selection (Fig. 5L). These results indicated the positive selection of T cell development in type 2 TETs was promoted by tumor cells, while blocked in type 1. This was further supported by the observation that the expression level of HLA-DR molecule of tumor cells in type 1 TETs was lower than other types ( Supplementary Fig. S6K). Taken together, these ndings demonstrated that unique epithelial origin of each type of TETs led to a different dysfunction of epithelium and T cell interaction and consequence in the discrepancy T cell developmental dysfunction among tumor types.
Tumor cells induce CD8 + T RM cell-mediated immune response in type 3 TETs.
Among three types of TETs, it was interesting that the thymic T cell development seemed absent in type 3 TETs, whereas mature T cells were still abundant in the mesenchyme of tumor tissue (Fig. 2C-2E). A possible reason for this phenomenon was the in ltration of immune cells induced by malignant cells of cancer, which could be commonly observed 33 . We con rmed that the CD8 + T RM (CD103 + CD69 + ) cells, playing a central role in immune sensing network of peripheral tissues especially in the recruitment of various types of immune cells when they are activated 34 , were signi cantly enriched in type 3 TETs ( Fig. 6A and 6B). The unique abundance of CD8 + T RM cells in type 3 TETs was further con rmed by ow cytometric analysis (Supplementary Fig. S7A). Compared with other epithelial subpopulations, CHI3L1 + mTEC-like cells, which were most highly enriched in type 3 TETs, were predicted to have a largest number of interactions with CD8 + T RM cells ( Fig. 6C and Supplementary Table S7). Some speci c ligand-receptor interactions between CHI3L1 + mTEC-like tumor cells and CD8 + T RM cells were identi ed, including TNFSF9:HLA-DPA1, CCL20:CXCR3, CEACAM5:CD8A ( Fig. 6D and Supplementary Table S7). Consistently, CD8 + T RM cells in type 3 TETs expressed higher level of activation and proliferation related molecules, including CD38, CD39, Ki67 and CD137, while the inhibitory molecule was decreased ( Fig. 6E and Supplementary Fig. S7B). Furthermore, genes related to activation and TCR signaling at the transcriptional level were also signi cantly upregulated in CD8 + T RM cells compared with other types of T cells in the tumor of type 3 TETs (Fig. 6F). Pathway analyses revealed the upregulated gene of CD8 + T RM cells signi cantly enriched in TCR signaling pathway, antigen processing and presentation, cytokinecytokine receptor interaction and chemokine signaling pathway (Fig. 6G). Further analysis demonstrated There were several kinds of histological classi cation and clinical stage of human TETs for decades 17 .
However, due to the limitation of current classi cation systems, novel classi cation which could perform more easily and have better prognostic predictive effect was urgent needed 18 . To this end, we tried to establish a new classi cation system of human TETs based on our above ndings, especially the unique epithelial origin of tumor cells in each type of TETs which played pivotal role in T cell developmental dysfunction and immune response in tumor microenvironment. Firstly, we selected representative genes based on scRNA-seq results which represented the unique epithelial cell type in each TETs types (Fig. 5F) and found that GNB3 and CHI3L1 were expressed highly in tumor of type 1 and type 3 TETs respectively ( Fig. 7A and 7B). Consistently, the expression of GNB3 and CHI3L1 in tumor was further validated by IF staining (Fig. 7C). Therefore, we reclassi ed 119 TETs patients of TCGA cohort into three groups according to the expression level of GNB3 and CHI3L1 (Type 1, GNB high ; Type 2, GNB low CHI3L1 low ; Type 3, GNB low CHI3L1 high ) ( Supplementary Fig. S8A). Kaplan-Meier survival analysis demonstrated that the classi cation we established closely correlated with survival of TETs patients ( Fig. 7D; P = 0.014). Moreover, it was surprising that our classi cation based on GNB3 and CHI3L1 expression showed an advantage in the prognostic prediction of TETs patients compared with both Masaoka stage and WHO classi cation ( Supplementary Fig. S8B and S8C).

Discussion
TETs are common tumors originated from human thymus, the central lymphoid organ which is essential for T cell development 35 . The particular origin of TETs suggests that the development of tumor may be involved in T cell developmental dysfunction. Through decoding the immune landscape and cell atlas of tumors, we successfully reclassi ed human TETs into three types. We linked the immune landscape and T cell development pattern of each tumor type to the dysfunction of T cell development which was controlled by its epithelial origin. Furthermore, we decoded the speci c epithelial origin of tumors and revealed the underlying mechanisms of tumor cell-driven T cell developmental dysfunction in each tumor type of TETs. Next, we uncovered the consequence of tumor cell-mediated T cell developmental dysfunction in each tumor type through TCR repertoire, diversity and related signal pathway. In addition, we decoded a unique immune response pattern of type 3 TETs which was mainly orchestrated by tumorinduced CD8 + T RM cell activation. Finally, we established a new classi cation of human TETs using the molecules which re ected the epithelial origin of each tumor type and has shown advantage in prognostic prediction.
TETs are heterogeneous tumors considered to originate from TECs 36 . However, exactly cellular origin of TETs is largely unknown 9 . Growing evidence has shown that TECs in human thymus are extraordinarily heterogeneous 28 . Therefore, it is reasonable to speculate that the epithelial origin of human TETs may also be heterogeneous. The ndings of discrepant tumor cell composition among human TETs types in our study supported this hypothesis (Fig. 5E). Besides, we identi ed the exact tumor cell subset of each human TETs type which represented its epithelial origin, for instance KRT14 + /GNB3 + mTEC-like cells in type 1, CCL25 + cTEC-like cells in type 2 and CHI3L1 + mTEC-like cells in type 3 ( Fig. 5E and 5G). These ndings in our study suggested that there was heterogeneity in epithelial origin of human TETs, which may determine the biological nature and prognosis of human TETs.
Genetic aberrations were reported in human TETs a decade ago 37 . Subsequently, a speci c missense mutation in GTF2I was demonstrated to occur at high frequency in type A and type AB thymomas, but rarely in the aggressive subtypes 38 . In contrast, thymic carcinomas carried a higher number of mutations in recurrent mutations of known cancer genes than thymomas, such as TP53, CYLD, CDKN2A, BAP1 and PBRM1 38 . Consistently, another study identi ed the recurrent somatic mutations in TET2, CYLD, SETD2, TP53, FBXW7, HRAS and RB1 in thymic carcinoma, and no mutations in GTF2I 39 . However, the details of how these driver oncogenes promote tumor developing and the key driver genes in each type of TETs are still unknown. Recently, GTF2I mutation was reported to induce the cell transformation and metabolic alterations of TECs, which provided direct evidence of GTF2I mutation in promoting tumorigenesis of TETs 40 . Through decoding the epithelial origin of each human TETs type (Fig. 5E-5G), we identi ed GNB3 and CHI3L1 which speci cally expressed in type 1 or type 3 tumors respectively in our study (Fig. 7A-7C).
These evidences suggest that GNB3 and CHI3L1 may be the driven oncogene in original TECs of type 1 and type 3 of human TETs respectively.  S8A). We observed that our classi cation of TETs had advantage in prognostic prediction when compared with WHO classi cation and Masaoka stage in the same cohort of patients (Fig. 7D, Supplementary Fig. S8B and S8C). Our ndings support the idea that the classi cation of TETs derived from biological characteristics of tumors, especially key molecular events occurred in epithelial origin of tumor, will predict the outcome and guide clinical practices more accurately.
High incidence rate of autoimmune disease is one of the most remarkable clinical features of human  14 . However, the cellular and molecular mechanisms of immune disorders in TETs were still largely unclear. We speculated the epithelial origin of human TETs, which play an essential role in thymic T cell development and central self-tolerance 13 , accounted for the generation of immune disorders in patients. As expected, we found that cTEC-like cells increased in the tumor of type 2 TETs, in contrast mTEC-like cells decreased (Fig. 5G). Consistently, AIRE, the key molecule for negative selection of T cells, was not expressed in the major subsets of tumor cells in type 2 TETs (Supplementary Fig. S6D, S6E and   S6H). These evidences suggested that the increased cTEC-like cells, decreased mTEC-like cells and decreased AIRE expression in tumor of type 2 TETs led to the dysfunction of central self-tolerance and generation of autoimmune disease in patients. In addition, the nding of relative lack of mature T cells in tumor ( Fig. 2D and 2E), suggested the defects of T cell maturity in tumor may account for the generation of acquired T cell de ciency in type 1 TETs. These ndings in our study provided a new clue to understand the generation of immune disorders in TETs patients.
Taken together, we reported a comprehensive study rstly using multiple omics to decode the biological characteristics of human TETs from immune phenotype to its underline mechanism orchestrated by the epithelial origin of tumors. We uncovered an epithelium-T cell loop which depicted the core discrepancy of biological characteristics among tumor types and reclassi ed human TETs. Finally, according to the unique gene expression in the identi ed epithelial origin of each tumor type, we successfully established a convenient molecular classi cation of human TETs which showed advantage in prognostic prediction. We believed that our ndings provided insights into understanding the biology of human TETs and will shed more light on the translational research of human TETs.

Patients and specimens
Tumor tissues (T, homogeneous cellularity, without foci of necrosis) were obtained from patients with

Immuno uorescent staining and image acquisition
The 5µm-thick FFPE sections were also backed at 37°C overnight, dewaxed in xylene, rehydrated through decreasing concentrations of ethanol, and washed in PBS. Antigen retrieval was performed in Target Retrieval Solution (TRS) citrate buffer (pH 6.0) using a pressure cooker. Then, they were blocked with a blocking buffer solution (5% FBS, 1% BSA and 0.2% Triton) for 2 h at room temperature and incubated with primary antibody in blocking buffer (BioLegend) at 4°C overnight. After washed by PBS (pH 7.4), secondary antibody staining was performed at room temperature for 1 h. Next, sections were washed three times with PBST and DAPI reagent was added for 10 min to detect cell nuclei. The antibodies of CD3 (ab16669, diluted at 1:100), CD8a (ab17147, diluted at 1:100), CD4 (ab133616, diluted at 1:500), CD103 (ab129202, diluted at 1:800), CD11c (ab52632, diluted at 1:500), CD20 (ab778237, diluted at 1:2000) and EPCAM (ab223582, diluted at 1:500) were procured from Abcam. The antibodies of CD45 Mass Cytometry (CyTOF) sample preparation 42 metal-conjugated antibodies used in this study were showed in supplementary Table 1. Brie y, the cells derived from the TETs samples were stained with 5µM cisplatin (Fluidigm) in PBS without BSA for viability staining. Then, samples were washed in PBS containing 2.5% BSA and blocked for 30 minutes at 4°C. After that, they were stained with cell-surface antibodies in PBS containing 5% goat serum and 30% bovine serum albumin (BSA) for 30 min at 4°C. Next, samples were washed, xed and permeabilized using the Foxp3 x and permeabilization kit (eBioscience) as well as 100nM Iridium nucleic acid intercalator (Fluidigm) according to the manufacturer's instructions at 4°C overnight. Cells were then washed twice with Foxp3 permeabilization buffer and incubated with intracellular antibodies in permeabilization buffer for 30min at 4°C. Finally, cells were washed twice with ddH 2 O to prepare for analysis.

CyTOF Data Acquisition and analysis
Right before acquisition, samples were washed and resuspended at a concentration of 1 million cells/ml in water containing EQ Four Element Calibration Beads (Fluidigm). Samples were acquired on a Helios CyTOF System (Fluidigm) at an event rate of < 500 events/second. EQ beads (Fluidigm) were used for a loading control. All data were produced on a Helio3 CyTOF Mass Cytometer (Fluidigm). Mass cytometry data les were normalized using the bead-based normalization software, which uses the intensity values of a sliding window of these bead standards to correct for instrument uctuations over time and between samples. CyTOF analyses were performed by PLTTech Inc (Hangzhou, China) according to the previously described protocol 43 . The data were gated to exclude residual normalization beads, debris, dead cells and doublets for subsequent clustering and high dimensional analyses. The 42 immune cell markers were all applied for clustering and visualization. Phenograph algorithm 44 was used to cluster cells. 50,000 cells were selected randomly for visualizing by t-distributed stochastic neighbor embedding (t-SNE) dimension reduction algorithm 45 using R package cytofkit (version 0.13). Immune subset cells were de ned by the median values of speci c expression markers on Hierarchical clustering. Heatmaps of normalized marker expression, relative marker expression, and relative difference of population frequency were generated using pHeatmap R package and Python (https://www.python.org/) respectively. The comparisons between two groups were assessed by unpaired Student's t-tests using GraphPad Prism (v8). The use of these tests was justi ed based on assessment of normality and variance of the distribution of the data.

Single-cell gene expression analysis
Raw gene expression matrices were generated using CellRanger (10x Genomics) and analyzed using the Seurat v3 R package 46 47 . All cells expressing < 500 genes were removed, as well as cells that contained < 500 unique molecular identi ers (UMIs) and > 25% mitochondrial counts. Samples were merged and normalized. Because every cell has a unique barcode, scRNA-seq data could be linked with the scTCR-seq data. All assembled contigs were ltered to retain only those that were assigned a raw clonotype ID and categorized as being both full length and productive. Each clonotype was assigned a unique identi er, consisting of the predicted amino acid sequences of the CDR3 regions of these two chains, which was used to match clonotypes across samples. Clonality, which re ects the dominance of particular clones across the TCR repertoire, was calculated per sample. To visualize the degree of TCR clonotypes shared between T cell phenotypes, we used barcode information to project T cells with prevalent TCR clonotypes on UMAP plots.

Pseudotime reconstruction and trajectory inference
The R package Monocle (version 2) algorithm was used to reconstruct pseudotime trajectories to determine the potential lineage development among diverse T cell subsets and epithelial cell subsets 48 .
For each analysis, PCA-based dimension reduction was performed on DEGs of each phenotype, followed by two-dimensional visualization on UMAP. Graph-based clustering (Louvain) identi ed T cell into twelve subclusters. The cell differentiation trajectory was then captured using the orderCells function. As before for differential gene expression analyses, the Rpackage MAST was used to detect genes signi cantly covarying with pseudotime, based on a log-likelihood ratio test between the model formula including cell pseudotime and a reduced model formula. Additional model covariates were included in the residual model formula. Benjamini-Hochberg multiple testing correction was used to calculate FDR, and genes < 5% FDR were considered to vary signi cantly with pseudotime. For T cells from different groups, the same process with the same signature genes and Monocle parameters was used to construct the clonebased trajectories.
Cell-to-cell communication of scRNA-seq data CellphoneDB (www.CellPhoneDB.org) was used to assess putative receptor-ligand interactions between epithelial cell and T cell subsets, CD8 + T RM cell and other cell subsets (epithelial cell, B cell, DC, Fibroblast cell, monocyte, macrophage, VSMC). Brie y, the algorithm allows the detection of ligand-receptor interactions between cell types in scRNA-seq data using the statistical framework described in previous studies 29 49 . The tool was run for 1000 iterations. The normalized interaction score has been calculated by multiplying the average expression level of ligand and receptors for all cell pairs, and maximum normalizing these values. The total number of pairwise paracrine interactions between CD8 + T RM cell and epithelial cell subsets obtained using the CellphoneDB scoring method were visualized as heatmaps in the R package pheatmap.

Survival analysis
The TCGA TETs data were used to evaluate the prognostic effect of individual genes or gene sets derived from speci c cell clusters. The gene expression and survival data were downloaded from UCSC Xena (http://xena.ucsc.edu/). 119 patients with records of GNB3 and CHI3L1 gene expression information were included into the survival analysis (of these, 117 with the WHO classi cation and Mosaok stages). The expression pro le was normalized by log2 (normalized_count + 1) to exclude potential bias. For each gene, patient cohorts were grouped into high and low expression groups by the top 20% value of the normalized average expression. Kaplan-Meier survival curves were plotted to show differences in survival time, and P values reported by the Log-rank test using GraphPad Prism (v8).

Statistical Analysis
Statistical analysis was performed using Prism version 8.3 (GraphPad Software). Unpaired Student's t tests were used where comparisons were used to identify differences between two groups. All results are presented as mean ± SEM, and P values of < 0.05 were considered statistically signi cant (*P < 0.05, **P < 0.01, ***P < 0.001 and ****P < 0.0001). Figure 1 Cellular composition of the human thymus and TETs. See also Supplementary Fig. S1, and Supplementary Tables S1 and S2. (A) Scheme of the overall study design and work ow. (B) Reduceddimensional single-cell t-SNE maps of total cells from human TETs (n = 12) and normal thymus (n=3) by CyTOF, each color represents an independent sample. (C) Representative HE and immuno uorescence (red for EpCAM, green for CD45, pink for CD3 and blue for DAPI) stained sections of TETs and normal      The ligand is secreted by the cell at the beginning of an arrow, and the receptor is expressed by the cell at the end of that arrow.