Assessment of NAT10 characteristics in TNBC based on TCGA database
RNA-seq data of breast cancer patients (1104 cases) with cancer tissues and (113 cases) with paracancer tissues were obtained from the TCGA database, and the Reads obtained by sequencing were used to re-annotate to. NAT10, and after comparing the expression of NAT10 in cancer tissues relative to paracancer tissues, significant high expression occurred in each PAM50 isoform, among which, the difference in high expression between the basal like isoform and the difference in high expression between the basal like subtype and the paraneoplastic tissues was most obvious (Figure1A, t-test, p<0.05); a significant high expression of NAT10 was also found in 116 TNBC cases (Figure1B, t-test, p<0.05). Further, the present study carried out an in-depth exploration of TNBC.
Since the correlation between tumorigenesis and its immune microenvironment, the relationship between NAT10 expression and immune cell infiltration in TNBC was investigated by using the immune cell abundance prediction database ImmuCellAI[29] (http://bioinfo.life.hust.edu.cn/ImmuCellAI) to predict TNBC Immune cell abundance. TNBC samples were divided into high and low expression groups using NAT10 expression. The box plots of the abundance of different immune cell infiltration between high and low expression groups showed that the infiltration abundance of immune cells such as iTreg, Th2,Th17, DC, B cell, Monocyte in TNBC was related to NAT10 expression (Figure1C, t test, p<0.05). The abundance of iTreg, Th2, DC cells were higher in the high NAT10 expression group, and the abundance of Th17, B cell, Monocyte cells was higher in the low NAT10 expression group than the other group. These results suggest that NAT10 is involved in the pathogenesis of TNBC by affecting the tumor immune microenvironment.
The changes of mRNA and lncRNA between high and low NAT10 expression groups in TNBC were also observed. The 731 significantly up-regulated mRNAs (Figure 1D) and 730 significantly differentially expressed up-regulated lncRNAs (p<0.05&|FC|>2) were screened using the limma package (Figure 1E). The pearson correlation analysis of expression between 730 differential lncRNAs and 731 differential mRNAs were performed and filter 1157 pairs of relationships contained 470 lnc,591 mRNAs(r>0.5) (Figure 1F). The expression of these 470 lncRNAs that were significantly different between high and low NAT10 expression groups in TNBC.
In order to find the most prognostically relevant lncRNA features, single-factor cox and multi-factor cox were carried out by combining the age, gender, stage, TNM stage and other clinical factors of TNBC patients. Finally, 12 lncRNAs that were significantly associated with prognosis were screened(p<0.05). The sum of the product of their multi-factor cox risk coefficients and expression values was used as the prognostic risk-score, and the median of the risk-scores was used to divide the TNBC samples, and a log-rank test was performed between the high-risk and low-risk groups. The results showed that the prognostic outcomes differed significantly between the high- and low-risk groups (Figure1G, p=0.00092), and the high-risk group corresponded to a relatively poor prognosis. The above results suggest that lncRNA is involved in the process of NAT10 promoting acetylation.
Screening of key ac4C-modified genes
Prof. Shalini Oberdoerffer from NIH published a study in CELL[30], which revealed for the first time that a large number of ac4C modifications exist on mRNA and affects the stability and translation efficiency of mRNA. In order to investigate the role of ac4C acetylation in TNBC, the genome-wide acetylation levels of cancer and paracancer tissues from two patients were examined by acRIP-seq. The cumulative distribution of ac4C modified peak on RNA structures from both sequencing showed the same results, with the enriched regions of ac4C modifications in disease and normal groups mainly concentrated in the CDS region (Figure2A-B). Peak annotation analysis showed that the enriched regions of ac4C modifications in cancer and ortho-paraneoplastic tissues generated by sequencing in both pairs of patients were mainly concentrated in the CDS, the 5'UTR and 3'UTR regions, with the least distribution in exonic regions (Supplementary Figure1). The enrichment analysis was also performed on the acetylated ac4C-modified motifs sequenced twice using HOMER software, and the enrichment of each motif was determined by scanning all sequenced sequences, and its significance was calculated by hypergeometric distribution. The top three motif predictions in order of enrichment significance p-value are shown (p<1e-15), and the results show that the motifs exhibit a high level of matching with "CXXCXXCXX..." (Figure2C).
The ac4C-seq sequencing results obtained from the two experimental conditions were screened for differentially peak ac4C genes separately. The first screening yielded a total of 350 differential ac4C genes, including 174 differentially up-regulated genes and 176 differentially down-regulated genes (Supplementary Figure2). A total of 1242 differential ac4C genes were screened in the second screening, among which there were 858 differentially up-regulated genes and 384 differentially down-regulated genes (Supplementary Figure3). The significantly different ac4C genes from two conditions were sorted and combined by difference level log2FC using RRA tool. The genes with difference significance p<0.05 in the integrated dataset and those differentially modified in both sequencing were selected as the final selection. 180 differential ac4C genes were eventually obtained, of which 30 genes were significantly different in both patients relative to the normal ac4C-modified peak at the same time, and more than half of the genes modified peak showed the same differential trend (Figure2D). The mean values of these genes relative to the normal modification peaks in both patients as well as the mean expression levels are also shown in Figure2E. Further, the gene expression data from TCGA and sequenced gene expression data were corrected for batch effects using the COMBAT in R. The gene expression profile of 180 differential ac4C genes in TCGA for TNBC showed the same trend in the expanded sample for the 180 genes (Supplementary Figure4).
Biological function of the ac4C differential genes in TNBC
In order to explore the biological functions of the 180 peak differential ac4C genes, GO annotations based on the DAVID database were performed at three levels, biological processes (BP), molecular functions (MF), and cellular components (CC), respectively. Fisher's exact test was used to calculate the significance level (P-Value<0.01) of each analysis. The results showed enrichment in BP such as mRNA transcription-translation processing, cell-cell adhesion, CC such as extracellular exosomes and adherent patches, and MF related to gene expression such as protein binding, RNA binding, and translation (Figure3A). Pathway annotation of the screened differential ac4C genes was performed based on KEGG, and the significance level of Pathway was calculated using Fisher test (P-Value<0.05). The results showed that ac4C modified altered genes were enriched in pathways such as shear body, RNA transport, lysosome and hepatitis B(Figure3B).
Further, Gene expression levels of 180 differential ac4C genes were combined with clinical factors such as age, gender, stage, and TNM stage of TNBC patients in order to perform univariate cox and multivariate cox regression analyses to find the most prognostically relevant features. Six ac4C genes were finally screened that were significantly associated with prognosis (p<0.05): COL3A1, CYFIP1, SFN, SMOC2, TRIR, and USP8. Prognostic risk scores were obtained by summing the product of the multivariate cox coefficients and gene expression values of the six prognostic risk ac4C genes. The median value of the risk scores was used to classify TNBC samples into high and low risk groups. The log-rank test was performed between the two groups, and the results showed that the prognosis was significantly different between the high and low risk groups (p<0.05) (Figure 3C), and the high risk group corresponded to a relatively poor prognosis. In addition, using the mean values of the expression levels of the six prognostic risk genes to classify the samples into high and low risk groups, the final infiltration Score of all immune cells differed between the high and low groups (Figure3D) (p<0.05).
The ac4C-related prognostic risk lncRNAs in TNBC
To explore lncRNAs specific in TNBC, t-tests were performed between TNBC and normal groups were screened in TCGA, and got 2544 lncRNAs with significant differences in expression (p.adj<0.05). With pearson correlation analysis 180 ac4C differential genes and 2544 differential lncRNAs were screened for significant correlations between them (pearson, r>0.5, p<0.05). A co-expression network (Figure4A) was constructed, totally 1080 pairs of lncRNA-gene relationship pairs were screened. The results included 436 lncRNAs with differences between TNBC and normal, 116 ac4C differential genes. In particular, 5 out of 6 ac4C prognostic risk genes were found in the network, and 44 lncRNAs were co-expressed with them, including 3 NAT10-influenced prognostic risk lncRNAs screened out: LINC01614, OIP5-AS1 and RP5-908M14.9 (Figure4B). The one-step nearest neighbor of three lncRNAs was extracted from the co-expression network (Figure4C), and LINC01614-COL3A1, OIP5-AS1-USP8, RP5-908M14.9-TRIR showed strong correlation in expression in TNBC. The expression levels of COL3A1、TRIR showed an upregulation trend in TCGA, but USP8 showed an downregulation trend (Figure4D). Also, all 3 genes showed differential acetylation, higher in cancer than in paracancerous tissues (Figure4E). Analysis of the data in the TCGA database also suggests that NAT10 protein may further regulate the expression of ac4C gene by regulating the expression of related lncRNAs thereby affecting the process of acetylation modification and participating in TNBC prognosis.
Acetylating ac4C genes predict drug sensitivity in TNBC
TNBC is insensitive to endocrine therapy because of ER and PR negativity, and insensitive to targeted therapy (Herceptin) because of HER-2 negativity. The main treatment option is chemotherapy, but it responds poorly to conventional chemotherapy and easily resistant. It is important to explore markers that can be used to predict drug sensitivity. To further determine whether differential ac4C genes could help predict effective drug targets in TNBC, 79 in 180 differential ac4C genes were screened from the DRUGBANK database that could be used as drug targets corresponded to 35 drugs. Among them, genes USP8, one of the three prognostic risk ac4C gene obtained from the previous analysis, is drug target, and corresponded to drugs is Zn(II). Zn(II) was reported to be antiproliferative in human cancer cells[31] and had stronger active cytotoxicity in inducing morphological changes in breast cancer cell lines compared to cisplatin, and non-toxic to fibroblasts[32]. The network of differential ac4C genes with drug targets is shown in Figure5A, in which the orange and purple nodes are 79 differential ac4C genes and the green nodes are drug names, and the purple nodes are prognostic risk genes related to NAT10.
Furthermore, to explore the predictive effect of the screened three prognostic risk ac4C genes (USP8, COL3A1、TRIR) regulated by NAT10 protein on drug-sensitive response, the sum of the product of the cox risk coefficients of the 3 genes and the expression values were used as the prognostic risk score, and the median value of the risk scores were used to divide TNBC samples with high and low groups. Drug sensitivity prediction was performed using the R package "pRRophetic". According to the latest expert consensus, the drugs commonly used in TNBC are Doxorubicin and Paclitaxel. (Figure 5B, 5E), the IC50 data of the two drugs were obtained from GDSC database. The linear model was used to verify whether the drugs met the linear model, and the QQ plot showed that both drugs were in line, and the linear model criteria could be used to predict the IC50 values of the samples using ridge regression. Next, ridge regression was used to estimate the half-maximal inhibitory concentration (IC50) of the TCGA samples and tenfold cross-validation was used to assess the predictive accuracy of the estimated IC50. The Mann-Whitney-Wilcoxon test was used to test whether the IC50 distribution was the same for the high-risk and low-risk groups.
Doxorubicin, a highly effective broad-spectrum anticancer drug widely used in TNBC, and an anthracycline antibiotic that embeds between the DNA double helix and inhibits replication after unstranding, significantly predicted drug sensitivity in both high and low risk groups, allowing particularly accurate estimation of IC50 values in patients (Figure5C, 5D). Paclitaxel inhibits microtubule depolymerization by by inhibiting mitosis in cancer cells, and the PD-L1 antibody atezolizumab in combination with albumin-paclitaxel was approved by the FDA as standard therapy for PD-L1-positive TNBC. In TNBC samples from TCGA. The sensitivity of the drug was significantly predicted in both high and low risk groups classified by prognostic risk ac4C genes regulated by NAT10 protein, and the IC50 value could be accurately estimated (Figure5F, 5G). However, there was no significant difference in sensitivity between the two drugs in the high and low risk groups (Supplementary Figure5), which also demonstrated the high risk of TNBC and the lack of effective targeted drugs. In particular, lncRNAs (LINC01614, OIP5-AS1 and RP5-908M14.9) interacting with three prognostic risk ac4C genes can also significantly predict drug sensitivity in TNBC patients, but with low relative accuracy (Supplementary Figure 6).