Background: Mutation processes leave different signatures in genes. Previous studies have suggested that both the mutated and flanking bases influence somatic mutation characteristics. However, the understanding of how flanking sequences influence somatic mutation characteristics is limited.
Materials and methods: We constructed a long short-term memory – self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with SOM, somatic mutations in The Cancer Genome Atlas database were clustered according to their mutation type and flanking sequences. The relationship between MB and cancer characteristics was then analyzed. At last, we clustered the patients into different classes according to the composition of MB by K-means method, and then studied the differences in clinical features and survival between classes.
Results: Ten classes of mutant sequences (named mutation blots, MBs) were obtained from 2,141,527 somatic mutations. Different features in mutation bases and flanking sequences were revealed among MBs. MB reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. Class of MB in a given gene is associated with survival. Finally, patients were clustered into 7 classes according to MB composition. Significant differences in survival and clinical features were observed among different patient classes.
Conclusions: Our study provides a novel method for analyzing the information of mutant sequences and reveals the extensive relationships among mutant sequences, clinical features, and cancer patient survival.