Conjoint analysis of m6A regulators and copy number variations in thyroid carcinoma

Background: Most patients with thyroid carcinoma have a good prognosis, but some thyroid carcinomas are aggressive and prone to recurrence and metastasis. Further, concomitant overdiagnosis and overtreatment are important issues. This study aimed to investigate the relationship between mutations and copy number variations in m6A regulatory genes and the clinicopathological features of thyroid carcinoma. Results: Advanced pathological stage and T stage were signicantly correlated with changes in m6A regulatory genes (p<0.05). Patients with abnormal copy numbers of m6A regulatory genes had a signicantly shorter progression-free interval than patients with normal copy numbers. Mutations and copy number variations in m6A regulatory genes were signicantly correlated with advanced pathological stage and T stage. Conclusions: Copy number variations were a poor prognostic factor for thyroid carcinoma as evidenced by the signicantly short progression-free interval in patients with copy number variations in m6A regulatory genes. Further, the ZC3H13 gene may be involved in the occurrence and development of thyroid carcinoma by regulating cancer-associated signaling pathways and biological processes. These ndings may help distinguish patients with poor prognoses who may need more aggressive treatment from patients who have better prognoses.


Background
Thyroid carcinoma (THCA) is the most common malignant tumor of the thyroid gland, accounting for approximately 1% of systemic malignancies. THCA is classi ed into papillary carcinoma, follicular carcinoma, undifferentiated carcinoma, and medullary carcinoma, with papillary carcinoma being the most common type and having the best prognosis. Except for medullary carcinoma, most THCAs originate from follicular epithelial cells. The diverse forms of RNA, such as rRNA, tRNA, mRNA, snRNA, and other chemical modi cations of RNA, have received increasing research attention in recent years for their roles in cell biology processes. Among these processes, mRNA modi cation plays a crucial role in regulating posttranscriptional levels of gene expression.
Copy number variations (CNVs) are a kind of structural variation phenomenon and are ubiquitous in the human genome. CNVs are mainly caused by copy number ampli cation, deletions, missing, insertions, recombinations, and complex mutations at multiple sites with resultant abnormal fragments ranging in size from dozens of bases (> 50 bp) to several megabases. The American geneticist Calvin Bridges rst discovered CNVs in drosophila in 1936, and subsequent studies found that CNVs also exist in other species [9]. Accordingly, CNVs have been determined to directly affect and cause diseases by changing gene dosing, disturbing coding sequences, and interfering with remote regulation. Therefore, this study aimed to investigate the relationship of mutations and CNVs in m6A regulatory genes with the clinicopathological features, including disease progression-free intervals (PFI), of THCA. Toward this goal, we analyzed the clinical and sequencing data of THCAs from The Cancer Genome Atlas (TCGA) database and further evaluated the mutation pro les of 14 m6A-regulated genes in patients with THCA.

Results
Characterization of mutations and CNVs in m6A regulatory genes Using the TCGAbiolinks R package, data of 492 patients with THCA with m6A regulatory gene mutations were downloaded (Table 1). Of these patients, only twelve had mutation information on all 6 m6A regulatory genes. The KIAA1429 gene was found to have a frame shift insertion in two patients. The METTL14 gene underwent a silent mutation in the 5′UTR region in one patient. The RBM15 gene had a missense mutation in one patient, resulting in an amino acid replacement (Ser to Arg). The ZC3H13 gene had missense mutations in three patients, with two kinds of mutations in one patient, and seed codon changes in another patient (R646*). The readers YTHDC1 and YTHDC2 had diverse mutations in three patients. Overall, functional changes, including frameshift mutations in KIAA1429 and YTHDC1, a splice donor variant in YTHDC2, and a stop codon mutation in ZC3H13, were the most signi cant. CNV data of the 399 patients were downloaded from the cBioportal platform (https://www.cbioportal.org/) ( Table 2). The 14 m6A regulators all had slight CNVs, ranging from 1-3%, that is, of the 399 patients, 5-10 had CNVs. Notably, the copy number abnormality of the writer gene ZC3H13 reached 5.01%. This affected 20 patients, 15 of whom had missing copy numbers and 5 of whom had increased copy numbers. The second most frequent gene with CNVs was the eraser ALKBH5. It was mutated in 19 patients, 4 of whom had missing copy numbers and 15 of whom had increased copy numbers. The abnormal frequency distribution of copy numbers for all m6A methylation regulators is shown in Fig. 1. We found that m6A methylation regulators mainly mutated through copy number gain (Fig. 2). Figure 3 shows the most common patterns of CNVs in m6A regulatory genes among the 399 patients with THCA. Analysis of the relationship of changes in m6A regulatory genes with clinical pathology and molecular characteristics We evaluated whether there were signi cant differences in clinical factors such as age distribution, sex, and pathological stage between patients with mutations or CNVs and those without. In total, 56 patients with mutations in m6A regulatory genes or CNVs were included in the analysis ( Table 3). The chi-square test showed that advanced pathological stage and T stage signi cantly correlated with changes in m6A regulatory genes (p < 0.05). Analysis of BRAF and RAS genes, which are known to be related to THCA, showed that changes in these two genes did not signi cantly correlate with changes in m6A regulators (Table 4).  **Cases of neither mutations nor CNVs as con rmed through the TCGA database.

CNV, copy number variation
Correlation between different CNV patterns and mRNA expression of m6A-regulated genes As only a few samples had CNVs, we combined loss type (i.e., deep deletion and shallow deletion) with copy number loss and combined gain type (i.e., copy numbers gain and ampli cation type) with copy number increase. Then, t-tests were performed to compare the differences in expression between samples with different copy number type (loss, diploid, gain) for each m6A methylation regulator (Fig. 4). Results with no signi cant or increased copy numbers and low expression are shown in Additional File 1.
Next, we further estimated the effect of changes in m6A regulatory genes on their expression pro les. As shown in Fig. 4, the expression of six m6A regulatory genes was signi cantly correlated with their different copy number states. Increased copy number was associated with increased mRNA expression, while copy deletion resulted in decreased mRNA expression. The m6A regulatory genes whose expression signi cantly differed between samples with different copy numbers (three or two, some samples have only missing or increased copy numbers) included FTO, METTL3, YTHDF1, YTHDF2, ZC3H13, and WTAP. In addition, we also found that for ZC3H13 and WTAP, the median expression in samples with increased copy numbers was lower than that in samples with normal copy numbers. This may be caused by differences in the number of samples, that is, the number of samples with increased copy numbers being lower than that of samples with normal copy numbers.
Analysis of the association between CNVs of m6A regulatory genes and patient survival The comparison of PFI between patients with normal copy numbers and those with abnormal copy numbers is shown in Fig. 5. We found that patients with abnormal copy numbers had signi cantly lower PFI than patients with normal copy numbers (p < 0.05), indicating that an abnormal copy number was associated with a worse prognosis in patients with THCA. However, we found no signi cant difference in PFI between patients with copy number gain and those with copy number loss (Additional File 2). We also analyzed different status of PFI in different states of each gene (Additional File 3) and found no signi cant difference, possibly because the number of patients with CNV changes in a single gene was too small to yield a measurable effect.
Multivariate cox regression analysis of m6A regulatory genes Univariate and multivariate Cox regression analyses of patients with THCA showed that pathological stage and M stage were correlated with PFI (Table 5). Because only pathological stage and M stage signi cantly correlated with PFI in the univariate analysis (p < 0.05), they were the only factors included in the multivariate Cox analysis. These results, in turn, showed that pathological stage remained associated with PFI; thus, it was deemed an independent prognostic factor in THCA. Considering that ZC3H13 plays an important role in the methylation process and the frequency of mutations and that this gene showed the highest CNVs in patients with THCA, we explored the biological function of this gene in the pathogenesis of THCA. We found that low ZC3H13 expression was signi cantly correlated with oxidative phosphorylation, Parkinson's disease, Alzheimer's disease, and glutathione metabolism ( Table 6, Fig. 6). High ZC3H13 expression was associated with cancer-related pathways such as the TGF-ß signaling pathway and the Wnt signaling pathway, as shown in Table 7.

Discussion
Although THCA generally has a good prognosis, some high-risk patients develop metastasis and recurrence and can thus die from the disease. Therefore, relevant diagnostic and prognostic indicators of THCA need to be identi ed. Our analysis of the relationship between mutations and CNVs in m6A regulatory genes and clinical pathology showed that advanced pathological stage and T stage were signi cantly associated with changes in m6A regulatory genes. We then evaluated the effect of changes in m6A regulatory genes on their expression pro les and found that the expression of six m6A regulatory genes (i.e., FTO, METTL3, YTHDF1, YTHDF2, ZC3H13, and WTAP) signi cantly correlated with their different copy number states. An increased copy number was associated with mRNA expression, while copy deletion was associated with reduced mRNA expression. We also found a signi cantly lower PFI in patients with abnormal copy numbers than in those with normal copy numbers, indicating that abnormal copy number is a factor indicative of a poorer prognosis in patients with THCA. We also analyzed the PFI in different states of each gene. Because the number of patients with a single-gene CNV change was too small, the results were not signi cant. Furthermore, pathologic stage was an independent prognostic factor for patients with THCA. Therefore, we speculated that CNVs are involved in the occurrence of thyroid cancer by affecting cancer-related signaling pathways in which m6A regulatory genes are involved by changing their mRNA expression.  [11,12]. However, there are no concise prospective data to support the use of molecular markers alone to determine the degree of treatment or to predict the prognosis of patients with THCA.
New evidence indicates that m6A is involved in various aspects of RNA metabolism, including pre-splicing of mRNA, 3′end processing, nuclear export, translation regulation, mRNA attenuation, and noncoding RNA processing [7,[13][14][15]. m6A RNA modi cation affects tumor proliferation [16], differentiation, tumorigenesis, invasion [17], and metastasis [18] by regulating proto-oncogenes and tumor suppressor genes. Similarly, CNVs play a crucial role in the occurrence, development, and outcome of several cancers, such as lung, endometrial, prostate, and gastric cancers [19][20][21][22]. However, the role of m6A methylation modi cation and CNVs in THCA is unknown. In this study, we analyzed mutation data and CNV data of m6A regulatory genes in 492 patients with THCA from the TCGA database and found that KIAA1429 and YTHDC1 with frameshift mutations, YTHDC2 with splice donor variant changes, and ZC3H13 with stop codon mutations had the greatest impact on gene function, indicating that "writers" and "readers" may play an important role in the occurrence of thyroid cancer. The "writers" METTL3 and METTL14 are more likely to be mutated or undergoing CNVs in other genes in clear cell renal cell carcinoma [23], while the changes of "erasers" FTO and ALKBH5 have been proven to be more important in breast cancer, glioblastoma, and hematological malignancies [24][25][26]. The differences in genes related to different tumor types suggest that the regulation of m6A at the cellular level is complicated, and further studies are needed to investigate the regulatory mechanism of m6A in thyroid cancer.
The "writer" ZC3H13 gene, which is a canonical CCCH zinc nger protein that harbors a somatic frameshift mutation in colorectal cancer, suppresses colorectal cancer proliferation and invasion by inactivating Ras-ERK signaling [27], indicating that ZC3H13 may serve as a tumor suppressor gene. However, Gewurz et al. found that ZC3H13 may be a key upstream factor of NF-κB responsible for its activation [28]. Activation of NF-κB signaling accelerates tumor proliferation and invasion [29], suggesting that ZC3H13 may be an oncogenic protein. ZC3H13 may bind with K-ras, which is frequently mutated in various cancers such as non-small cell lung cancer and colon carcinoma [30,31] and is strongly associated with cancer progression [32]. Thus far, the expression and biological function of ZC3H13 in malignant tumors are still unclear. It is speculated that the ZC3H13 gene may also play a major part in the occurrence and development of THCA by regulating cancer-associated signaling pathways and biological processes. Further studies are needed to identify the speci c impact of ZC3H13 on the regulation of the downstream genes. With improvements in CNV detection technology, its pathogenic mechanism and relationship with gene mutations are expected to be widely recognized. CNVs in m6A regulatory genes are expected to provide new directions for the diagnosis and prognosis of thyroid cancer.
This study has some limitations that need to be considered when interpreting the ndings. First, the number of samples with increased copy numbers and samples with normal copy numbers (26 vs. 240) was not evenly distributed. Second, a relationship between the expression levels of ZC3H13 and the risk of THCA was not observed in this study; thus, further research is necessary to clarify this ambiguity. Despite these limitations, the study remains valuable because to the best of our knowledge, this is the rst study to investigate the genetic changes of m6A regulatory genes in THCA and to analyze the relationship of mutations and CNVs in m6A regulatory genes with the clinical pathology of THCA. To further clarify the speci c target mRNAs of m6A modi cation in the occurrence and development of THCA, we plan to conduct m6A-Seq and m6A MERIP studies in clinical samples to support our ndings.

Conclusion
This study revealed that mutations and CNVs in m6A regulatory genes were signi cantly correlated with advanced pathological stage and T stage, indicating that CNVs are associated with a poor THCA prognosis. Patients with CNVs in m6A regulatory genes had signi cantly lower PFI than those without CNVs. ZC3H13, the m6A regulatory gene that most frequently had CNVs, may play a major role in the occurrence and development of THCA by regulating cancer-associated signaling pathways and biological processes. These ndings provide evidence that elucidates the importance of epigenetic modi cation of RNA in THCA.

Data collection and preprocessing
All the clinical data, mutation pro les, and mRNA expression data are publicly available and from an open source. We systematically searched the TCGA o cial website (https://portal.gdc.cancer.gov) to collect clinical data, mutation pro les, and mRNA expression data of patients with THCA. Data of 492 patients with THCA with m6A regulatory gene mutations were downloaded through the TCGAbiolinks R package, whereas the CNV data of 399 patients were acquired from the cBioportal platform (https://www.cbioportal.org/). Clinical data for 301 patients with THCA who had both PFI and copy number information were downloaded using TCGAbiolinks R. For patients who developed recurrent progression, we used the PFI from the rst occurrence of disease. Thereafter, we excluded patients with PFI less than 30 days. Thus, 266 patients were included in the analysis; of them, 240 patients had normal copy numbers and 26 had abnormal copy numbers (i.e., increased or missing copy numbers).

Gene set enrichment analysis
Gene set enrichment analysis (GSEA) is a tool for analyzing data on whole-genome expression pro le microarrays and was used to explore the potential molecular mechanisms underlying the prognostic gene signature constructed in this study. GSEA was applied to compare enriched terms between the high-and low-risk groups of patients with THCA as well as to investigate the correlation with the Kyoto Encyclopedia of Genes and Genomes pathway. The patients were divided into two groups, the high expression group and the low expression group, based on the mean ZC3H13 expression. Then, a single-gene GSEA was performed on ZC3H13.

Statistical analysis
All statistical analyses were performed using SPSS 20.0 (IBM, Chicago, USA) and GraphPad Prism 6.0 (GraphPad Software, La Jolla, CA, USA). We explored the relationship between CNVs of m6A regulatory genes and clinicopathological characteristics such as age, sex, and TNM stage using the chi-square test or Mann-Whitney U test.
Kaplan-Meier curves and the log-rank test were used to evaluate the prognostic value of alterations in m6A regulatory genes. Cox proportional hazard regression was performed using SPSS. A p-value < 0.01 and a false discovery rate of q < 0.05 were considered to indicate statistical signi cance.

Ethical considerations
All the clinical data, mutation pro les, and mRNA expression data are publicly available from TCGA o cial website and cBioportal platform, which are open to the public under some guidelines. Therefore, the requirement for ethical approval was waived. The need for informed consent was also waived owing to the retrospective study design. Authors' contributions: HH conceptualized and designed the study, carried out investigations and formal analysis, and prepared the original draft. JYY carried out study validation and formal analysis, and reviewed and edited the manuscript.
XDF carried out visualization and investigation. YQN was responsible for data curation and software. DCL was responsible for resources and project administration. ZFZ supervised the study and acquired funding for the study.