Background
With containing chemical bases that encode a protein, genes influence the foundations of life. Mutations are changes throughout a gene that has the potential to affect the function of a protein. Whenever a mutation causes uncontrollable cellular proliferation, cancer arises. Accordingly, the tumor progression and mutations classified as drivers provide a growth advantage, whilst passengers just don't.
Methods
The goal of this research is to develop an effective classification system for discriminating between driver and passenger mutations from a methodological standpoint. A new gene identification and segregation model is presented in this research article. "(a) pre-processing, (b) treatment of class imbalances, (c) feature extraction, (d) feature selection, and (e) gene classification" are the five primary steps of the proposed model. To improve the quality of the data, the obtained raw data is first pre-processed through "data cleaning and data normalization". This transforms the raw data into something usable as well as effective. In reality, the dataset is skewed, with driver mutation labels appearing in far fewer instances than passenger mutation labels. To tackle the class imbalance problem, the pre-processed data is handled using enhanced K-Means + SMOTE. The most significant characteristics, such as gene-level features and mutation level features, are then retrieved from the balanced dataset. To decrease the computational burden in terms of time, the most optimum features are picked from the retrieved features using Forensic Interpretation Customized Hunger Food Search Optimization (FIHFSO). The traditional Hunger Games Search (HGS) and Forensic-Based Investigation Optimization (FBIO) are conceptually combined in this FIHFSO. The deep learning classifier that performs the segregation process is trained using the specified optimum features (using FIHFSO). A new improved Recurrent Neural Network (I-RNN) is introduced in this study effort for making the final judgment regarding the genes (i.e., classification of driver and passenger genes). Finally, the projected mode is validated to demonstrate its dominance in terms of categorization.
Results
The I-RNN model has been compared over the existing classifiers like CNN, LSTM, DBN, Bi-GRU, SVM, DRIVE (Dragomir et al., 2021) and EARN (Mirsadeghi et al., 2021), respectively. I-RNN model has recorded the highest accuracy as 95.5%, which is better than the existing models. The major reason behind this performance enhancement in due to the MSE loss function introduced within it. In addition, I-RNN model has recorded the minimal FPR as well as FNR.
Conclusion
The projected model is said to be highly significant for gene classification owing to its comparative high accuracy. The quantitative identification and segregation of passenger and driver genes in cancer datasets will contribute to precision medicine in oncology.