1. Distribution of the CNV
The CNV data (WGS) from 193 TARGET AML cohort patients (96 females and 97 males, median age 9 years), and the CNV data (SNP array) from 191 TCGA cohort patients (87 females and 104 males, median age 58 years) were used to investigate the distribution of CNV in AML.
The results of the TARGET AML cohort showed that the distribution of CNV in 99.6% (17507/17586) of said genes were below 6%; however, the other 35 CNV genes were above 90%. By contrast, the distribution of CNVs in the TCGA cohort were at most 14.1% (Fig 1A).
Consistent with a previous report[9], this study showed that CNVs were not randomly distributed in the chromosomes. Amplifications were more frequent in chromosomes 1,8,19, 21, and 22 of the TARGET cohort and TCGA cohorts, and deletions were more frequent in chromosomes 7, 16, and X. There were few CNVs in the normal tissues compared to the AML samples (Fig1 B).
The results also showed distinct CNV distribution patterns in three risk groups in the TARGET cohort. CNVs occurred most frequently in the standard risk group; in the high risk groups, there were more frequent amplifications in chromosome 7 and more frequent deletions in chromosome 19; deletions in chromosome 7 frequently occurred in the standard and low risk groups; amplification in chromosome X mainly occurred in the standard group, and deletions in chromosome X mainly occurred in the low risk group.
2. Integrative analysis of gene expressions in concordance with CNV
It has been demonstrated that modulating gene expression is one of the most important ways for CNV to play diversified roles in diseases[11-14]. In this study, the results showed that 5022 gene expressions in TCGA were significantly modulated by CNV (p <0.05, adjusted p <0.2) and 577 genes in TARGET were significantly associated with CNV (p <0.05, adjusted p <0.2). 251 genes overlapped in both cohorts (supplementary table 1), and among them 52.5% (132/251) gene-specific CNVs were located in chromosome 19 (19q13, 19p13) (Fig 2A), and four genes overlapped with the KEGG cancer panel, including KEAP1, SIRT6, RHEB and DNMT1 (Fig 2B).
3. Prognostic value of CNV in AML
To further explore the prognostic values of CNVs, a Multivariate Cox proportional hazards regression model analysis was performed on the TARGET cohort, adjusted with age and gender (p < 0.05, adjusted p < 0.05). There were 758 CNV genes found to be significantly associated with patient survival. Among them, 102 CNV genes overlapped with the TCGA AML cohorts (p < 0.05) (supplementary table 2), including 7 lncRNA and 10 miRNA genes, implicating potential gene regulating capabilities.
Moreover, the CNV genes with the largest probability of high risk were located at chromosome 7 and 16 (7q31,7q32, 7q33, 7q34, 16q24.1) (Fig 3A). In addition, it was observed that 7.8% (8/102) of these CNVs modulated gene expression in the TARGET cohort and 52.0% (53/102) of these CNVs modulated gene expression in the TCGA cohort respectively (Fig 3C).
We also compared these 102 genes with the KEGG cancer panel and found that three miRNA genes (MIR29A, MIR183, MIR335) overlapped with said panel (Fig 2B). COX model analyzing adjusted for age and gender additionally showed that MIR335 expression (miRNA sequencing data) was associated with survival in the TARGET cohort (n = 300, p = 0.015, HR: 0.911 95%:0.844-0.982, Fig 3DE).
4. Identification of CNV-modulated genes expressions associated with survival
The results showed that 2058 gene expressions were associated with survival in the TARGET AML cohort, 268 of which were validated in the TCGA cohort. Among these 268 genes were 7 CNV-regulated genes associated with survival in the TARGET cohort, and 87 CNV-regulated genes associated with survival in the TCGA AML cohort (Fig 4A). Furthermore, there were 5 genes in both cohorts whose expressions were associated with survival and also overlapped with concordant CNV directions, including CBFB (16q22), CHAF1B (21q22), DNMT1 (19p13), SAE1 (19q13) and SEMA4D (9q22) (Fig4 B). The effects of CNV on the expression of these five genes were shown in Fig4 C-L, and deletions of CBFB and SEMA4D downregulated the gene expressions, which implicated that CNVs of CBFB and SEMA4D may be protective, and amplification of CHAF1B, SAE1, and DNMT1 upregulated the gene expression.
Additionally, these CNVs were independent from the risk groups; in the TARGET cohort, they occurred more frequently in standard-risk patients, and in the TCGA cohort, they occurred more frequently in high-risk patients (Fig4 M).