Liver cancer is often associated with hepatitis B infection, and there is a progression from chronic hepatitis to hepatocellular carcinoma. The replication of the virus affects the cell cycle of the host. Many oncogenes that express themselves in the formation of proteins can be highly expressed. A microarray technique can be used as a high throughput measure to determine gene expression in the mechanism of hepatoma progression. However, it has lacks important information on the potential genes to trigger the disease. The information is about the produced mRNA and has a difference to the normal cell line as control.
This study aimed to identify the potential genes that could be used as predictors to detect liver cancer using a heuristic algorithm, simulated annealing optimization. The basic idea of this algorithm was to overcome the combinatorial problem using probability values to select the significant features as a predictor for the classifier model in the representative machine learning algorithms, such as SVM, KNN, Naïve Bayes. C5.0 Decision Tree, and Random Forest. The experimental results showed quite high performance, more than 90% on average. However, using the simulated annealing algorithm requires substantial computation time to identify genes that could be used for detecting liver cancer.
A large amount of gene expression data can be reduced by identifying the potential gene as a predictor in the liver cancer mechanism. Using simulated annealing optimization, the experiment result obtained almost all the selected gene belongs to the type of protein-coding in liver tumor progression. Using cross-validation, the result achieved high performance even though the required time for selection is higher than classification using the machine learning method.