1. The characteristics of gene mutations and chromatin accessibility in co-existing paired benign nodules (PAs) and PTC (CAs)
The CAs were characterized by comparison with PAs, including SNV, indel, and CNV, using WES in four paired benign and malignant thyroid nodules from four patients (Fig. 1A, Table S1). Most SNVs and indels were located in exons and introns, whereas only a small proportion of SNVs were located in intergenic regions (Fig. 1B). However, a total of 718 mutations targeting 657 genes were identified, and the mutation rate (density) differed among the four patients with PTC (Fig. 1C, D; Fig. S1A, B). The transition to transversion ratio of SNV ranged from 0.26–0.73 in PTC (Fig. 1E), while the proportion of variation type T > G and A > C was more correlated with PTC (Fig. S1E). In accordance with a previous report, we identified similar SNVs in the driver genes of PTC, including TTN, BRAF, and MUC16 (Fig. 1F). Mutation signature analysis was performed, and seven enriched mutational signatures (S1, S3, S6, S9, S15, S25, and S28) were identified in PTC samples, in which signatures S1 and S9 were found in all PTC samples (Fig. 1G; Fig. S1F). In terms of CNV, we also observed amplification of known PTC-related genes, including TRDN, COL11A2, and BAGE. Significant deletions of other known cancer-related genes were also identified, such as PRKN. However, no consistent changes in CNV in all of the four samples were found (Table S2, Fig. S1G).
Using ATAC-seq sequencing, we generated chromatin accessibility maps of PTC and coincidental benign nodules (adenomatoid nodules) from three patients with PTC (Fig. 1A). We obtained an average of 97.93% mapability and 61.02 million qualified fragments per sample (Table S3). All samples were enriched for reads at transcription start sites (TSSs) and exhibited the expected periodicity of the insert length (Fig. S2A–C). A global epigenetic landscape of PTC and paired adenomatoid nodules was present (Fig. 2). Chromatin accessibility in CA was lower than that in PA around TSSs (distance < 1 kb), although a broadly similar chromatin landscape was observed (Fig. 2B, Fig. S2D). The number of ATAC-seq peaks (open chromatin regions) in each sample ranged from 36938 to 77749. In total, 45048 high-confidence peaks were identified across all CAs and 44443 peaks were identified across all PAs, representing a total of 18329 distinct peaks (Fig. 2B, Table S3). For instance, a peak associated with CD82 is shared by PA and CA, whereas a peak associated with APOH and FOXP3 is specific to CA, and another peak within PTENP1 is specific to PA (Fig. 2C). We then calculated the overall similarity of the ATAC-seq profiles among all samples using multidimensional scaling, which showed that the samples were clustered into three groups, each one from one patient (Fig. 2D), which was also revealed by PCA analysis (Fig. S2E). Functional enrichment analysis (FEA) by GO showed that peaks were enriched in the proximity of genes related to cell development and morphogenesis specific to PA and CA, whereas the signature common peaks were enriched in the proximity of genes related to cellular metabolism processes (Fig. 2E). The chromatin accessibility of the gene regions around PTC-specific mutation loci is also a concern. Chromatin accessibility around most PTC-specific mutation positions showed obvious differences between PA and CA (Fig. 2F), indicating the potential impact of the mutation on chromatin accessibility.
2. The distinct higher expression of more negative cancer correlation factors in PA associated with hyper-accessibility, including ARHGAP24 and ARHGEF28
Next, we sought to determine the functional consequences of the differentially accessible regions observed in PTC and benign nodule samples. Differences in chromatin accessibility between PTC and benign thyroid nodules were analyzed to explore their impact on PTC and benign nodules. By comparing the signals for each peak in PA and CA, we observed that 72,689 (77.39%) peaks had reduced chromatin accessibility in CA (Fig. 3A). This analysis indicated that a widespread decrease in chromatin accessibility was associated with PTC progression compared with that of benign nodules. We then assessed the differential accessibility between these two groups; a signature set of 144 peaks was identified, including 46 increased and 98 decreased peaks in the PTC group compared to that of paired benign nodules (Fig. 3B). The differentially open chromatin regions were largely TSS distal (> 104 bp), with relatively few proximal promoter regions exhibiting differential accessibility (Fig. 3C and Fig. S3A). Large genomic domains (on the order of 104 to 106 bp) were more likely to have differentially open regions than others regions (< 103 bp), with different peaks in these domains (Fig. 3C). More domains enriched for increased accessible peaks in CA were located closer to TSS (0-102 bp) than PA (Fig. 3C). A heatmap of the top 50 signature regions highlighted the changes in chromatin accessibility that distinguished the two groups (Fig. 3D). Signature peaks assigned to the CA group were enriched in terms associated with purine and nucleobase-containing compound metabolic processes, whereas those assigned to the PA group were enriched in terms related to the transmembrane receptor protein tyrosine kinase signaling pathway (Fig. 3E).
Using RNA-seq data obtained from the same patients, gene expression in CA samples and PA were examined, and the top 25 genes in each group were clustered in a heatmap (Fig. 4A). Next, we integrated ATAC-seq and RNA-seq data, focusing on chromatin regions and differentially expressed genes. We identified 14 genes that were both differentially expressed and contained significantly differentially accessible peaks (Fig. 4B). Of these peak/gene combinations, three were more accessible and overexpressed in CA samples (right corner), whereas 11 were more accessible with higher RNA expression levels in PA(left corner). Moreover, in the set of signature peaks, we recapitulated differentially accessible peaks surrounding the specific peak annotated genes such as WDR72 (Fig. 4C). However, differential gene expression containing PTC-specific mutations and altered chromatin accessibility between PA and CA was observed only in CCDC40 and SDK1 (Fig. 4D Data RNA-seq).
The more differentially chromatin accessible regions between PA and CA were assigned to the gene regions of WDR72, LRP1B, PCOLCE2, RYR2, ERO1LB, ARHGAP24, CDON, SERTM1, and ARHGEF28. Andtheir expression was significantly higher in benign nodules compared to that in PTC. We further validated their expression using publicly available expression profiling data of thyroid cancers using the GEPIA2 and GEO (Accession No GSE64912, GSE152515) databases, which confirmed their higher expression in normal thyroid tissues compared to that in PTC samples, including ARHGAP24 and ARHGEF28(Fig. S4A B). In addition, CD82 was found to be highly expressed in PTC samples (Fig. S4C). These data indicate that RNA-seq is representative of the expression of these genes, and that the gene expression pattern in benign nodules is similar to that in normal tissues.
3. Differential motif enrichment analysis within the differentially accessible regions revealed TEADs as the key associated transcription factors
By checking whether transcription factor (TF) binding was affected in PTC and/or benign nodules, we determined the enrichment of transcription factor motifs within the differentially accessible regions between PA and CA. The most highly enriched motif was the binding site for the TEAD family of DNA-binding factors in PA (Fig. 5A), while JUN-FOS factors were enriched in CA. Moreover, TEAD4 and TEAD2 showed significantly increased footprints in different accessibility regions when comparing PA to CA (Fig. 5B). TEAD4 expression was further confirmed by RNA-seq (Fig. 5C) and GEPIA2 and GEO (Accession No GSE64912 GSE152515) databases (Fig. S5). This pattern confirmed that chromatin accessibility of TEAD4 target sites was decreased in CA, suggesting that reduced target site binding by benign nodule-specific TFs plays a critical role in the difference between PTC and adenomatoid pathogenesis.
The target genes of TEAD4 and the related pathways were characterized. TEAD4 target genes were obtained from a combined analysis of the hTFtarget database, Gene Transcription Regulation Database (GTRD) database, and RNA-seq data (genes upregulated in benign nodules) (Fig. 5D). A total of 103 genes were identified as potential targets of TEAD4, including ARHGAP24 and ARHGEF28, the expression of which tended to increase in PA compared to CA. In particular, TEAD4 target genes were enriched in cell differentiation and GRPase-mediated signal transduction (Fig. 5E).
4. The additional analysis of extrachromosomal circular DNA (eccDNA) derived from ATAC also showed potential effects on gene expression in CA and coincidental PA
ATAC-seq data can be used to identify eccDNA carrying known driver genes [26]. In this study, eccDNA derived from chromosomes was identified using the new Circle_finder pipeline. A total of 81 eccDNAs were obtained in this study: 48 in PA and 33 in CA (Table S4). The length distribution of eccDNA is shown in Fig. 6A, and 57 eccDNAs were < 1 kb. The eccDNA was derived from almost all chromosomes (Fig. 6B). By pooling all eccDNA identified in this study, the length distribution of these circles revealed characteristic peaks at approximately 400 bases (Fig. 6C). The higher GC content was relative to the genome average (Fig. 6D) and the enrichment of the eccDNA (or microDNA) was mainly located in the gene-up 2000 bp, introns, and intergenic regions (Fig. 6E). Finally, approximately 79% of the small eccDNAs reported here appear to have used flanking sequences of 2- to 15-base microhomology (Fig. 6F) to promote the ligation that gives rise to the circle.
Furthermore, the eccDNA-derived location in the chromatin differed between PA and CA (Fig. 6G). The top five most abundant eccDNAs (tandem gene duplications) identified in each group are listed in Table S4. eccDNA harboring the GPR157 gene (cpgi) was the most abundant eccDNA in the PA samples, while harboring the PROS1 gene (intergenic) in the CA samples. No driver genes for PTC were found in these eccDNAs. Different gene expressions of UPK3B, CCDC110, and SPACA9 were found in the RNA-seq data between CA and PA samples (Data RNA-seq), and no PTC-specific mutations were observed in these genes. Gene ontology analysis of all the genes carried on the eccDNA/duplication loci was conducted (Fig. 6H). These pathways were shown to be related to the cytokine-mediated signaling pathway in PA and DNA replication in CA samples.