One of the key challenges for classifying multiple cancer types is the complexity of Tumor Protein p53 mutation patterns and its individual effects on tumors. However, far too little attention has been paid to Deep reinforcement Learning on TP53 mutation patterns because of its extremely difficult result interpretations. We introduce a critic network by a long-short term memory, which is appropriated for discriminating the noise samples from a Feedback Generative Adversarial Network and analyzing the actor network. The correlation and analysis of the results in a belief network demonstrates significant relations between mutations and disease risk in cancer subtypes identification. In other words, the results indicate statically significant differences between the primary and secondary subtype groups of the most probable tumor.