Cancer is the uncontrolled or unregulated growth of abnormal (malignant or tumor) cells anywhere in the body arising from cells of a specific organ. Cancer arises from the loss of normal growth control of cells and these cells have the ability to create their own blood supply, breaking away from the organ of origin as well as traveling and spreading to other organs of the body(1). Cancer is a genetic disease hence can be inherited or sporadic. Cancer is caused by agents like chemical carcinogens causing DNA mutations, periodic injury, ionizing radiation such as ultraviolet radiations, hormones that stimulate uncontrollable cell growth, genetic abnormalities, immunological dysfunction, viruses like human papillomavirus, hepatitis B, and hepatitis C. (2).
Cancer has become a very big threat to humanity due to its fast growth rate and genomics and remains a frequently lethal disease in humans(the second most frequent cause of death) despite significant progress made in its diagnosis(3). There are different types of cancer and these include esophageal, oral, bladder, colon, ovarian, lung, breast, gastric, pancreatic, lymphoma, leukemia, glioma, prostate, testicular, melanoma, and hepatoma cancer. Pancreatic cancer is a common malignant tumor of the digestive tract which has a high degree of malignancy and poor prognosis(4). Cervical cancer is one of the highest occurring gynecological cancer that is attributed to high sexual activity with multiple partners, infrequent condom use, and immunosuppression(5).
Cancer diagnosis involves both computational and non-computational diagnosis methods. The non-computational methods include imaging where malignancy is suspected based on imaging information (structural and anatomic). The malignancy is later confirmed on histology and the use of imaging tools permits functional, biochemical, and physiologic assessment of the important aspects of malignancy. Sites for imaging include breasts, brain, lung, and mediastinum using imaging modalities such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound, Positron Emission Tomography (PET), and Magnetic Resonance Spectroscopy (MRS).
Other non-computational methods include Fluorescence In Situ Hybridization (Fish) technique, tumor markers cytologic and histopathological techniques which involves special staining procedures that can differentiate the different types of tumors e.g., toluidine blue stain. This differentiates mast cell tumors from other tumors as it stains the metachromatic granules present in mast cells. Serological methods such as Enzyme-Linked Immunosorbent Assay (ELISA) and Radioimmuno Assay (RIA) are used in the estimation of serum tumor markers. Immunohistochemistry can also be performed through the use of polyclonal and monoclonal antibodies to detect specific antigenic determinants present in the cells of the tissues. Polymerase Chain Reaction is also being used to establish a definitive diagnosis and classification of tumors based on the recognition of complex profiles that occur in specific tumor types (2).
These non-computational methods, however, have challenges associated with them. One of the challenges faced while using these non-computational methods is low sensitivity and specificity. Sensitivity of a method in the case of cancer diagnosis implies the ability of a method to be able to correctly identify all people in a population with cancer while specificity implies the ability of a method or test to correctly identify all the people in a population without cancer. The sensitivity and specificity values for most non-computational cancer screening methods is in the range of 70–80% and 60–70% respectively(6).
In line with the low sensitivity and specificity of the non-computational methods being used, the Positive Predictive Value (PPV) of the current tumor biomarkers used is also very low which is has led to the failure of cancer screening tests(6). For example, Pap smear, a cervical screening method has sensitivity, specificity, and PPV values of 55.5%, 75%, and 88.2% respectively which are low hence requiring a biopsy in most cases that is highly invasive (7). Due to the low sensitivity of these non-computational methods, there is always a need to perform multiple tests which are quite costly. For example, in a research carried out to evaluate the cost of breast cancer screening in the United States of America, it was seen that about 410 million dollars are spent by women above the age of 75 years in the process of breast cancer screening (8). This thus necessitates the development of genomic bioinformatics technologies to uncover blood-based tumor markers which can greatly improve the specificity and thus boosting the accuracy of the cancer screening process (6).
Currently, microarray technology is widely used in cancer research, diagnosis, and tumor classification for more than a decade. It has been extensively adopted due to limitations with conventional techniques of gene investigation in cancer which are mainly time-consuming and cost-ineffective. Microarrays are significantly advancing due to their small size and are thus applicable when surveying a large number of genes quickly or when the study sample is small. One of the earliest applications of microarray was to identify differences in gene expression between normal and cancer cells. DNA microarray analysis involves the use of an oligonucleotide chip, cDNA chip, and genomic chip. Oligonucleotide microarrays are used for studying gene expression, Single Nucleotide Polymorphism (SNP), mutation, and genotyping analyses, and cDNA microarrays are usually used for gene expression analysis (9).
Microarray technology is a powerful platform for biological exploration. They permit simultaneous analysis of hundreds to thousands of DNA expression sequences for genomic research and diagnostic applications and this provides a guarantee of revolutionizing the way gene expression is examined. In addition to monitoring and analysis of gene expression patterns, microarrays are broadly used to understand the genetic and epigenetic makeup of cancer cells. They are also used to decipher signal pathways of cancer-relevant transcription factors which have advanced the scientific understanding of how cancer-relevant transcription factors control gene networks and ultimately cancer development. Microarray technology also allows the cell’s status to be investigated on the molecular scale and can identify a given cell species by its gene expression profile. This is very pivotal in future cancer diagnosis as traditional methods cannot distinguish between morphologically similar but molecularly different tumors. The molecular differences significantly affect the clinical course of a disease (9).
Although the microarray technique for tumor diagnosis and classification is promising as a future diagnostic modality. However, there are many limitations which include the inability to accurately diagnose individual tumors by gene expression profile alone due to the lack of development of a specific group of biomarkers for the diagnosis of specific tumors. This is also attributed to the fact that data analysis tools and methods also need to be developed. In addition, the cost of microarray experiments is high and there are remarkable variations within the same tumor shown by the gene expression data and also early tumor detection is not possible by the gene expression profile. Despite these limitations, DNA microarray is best used for molecular classification based on genetic and biological changes. In conclusion, microarrays are a major tool for the investigation of global gene expression for all aspects of human disease and biomedical research (H. Kim, 2004).
To analyze the vast amounts of data generated by the microarray technology, computational methods are required to be incorporated hence the need for bioinformatics approaches. Bioinformatics is an interdisciplinary approach that integrates information technology with biological science and involves the creation of databases, the development of software, and data handling for interpretation and analysis on a large scale. The aims of bioinformatics include; first organizing data allowing researchers to easily access existing data as well as submit data into the database. Secondly, it can be used to develop tools and resources that aid in data analysis and finally use the tools to analyze and interpret data in a meaningful biological manner (11). Bioinformatics has applications in several fields such as agriculture, medical science, forensic science, pharmaceutical, and biotech industry (12).
Focusing on the medical applications particularly cancer which is one of the main diseases that destroy lives all over the world, this field of bioinformatics has rapidly grown keeping its pace with the genome sequence expansion (13). Bioinformatics tools/technologies such as web technology, Cytoscape, Gene Expression Profiling Interactive Analysis(GEPIA) and databases such as National Center for Biotechnology Information (NCBI), gene omnibus databases, Surveillance, Epidemiology, and End Results (SEER) database, Kyoto Encyclopedia of Genes and Genomes (KEGG) are being used in cancer research and diagnosis in the identification of biomarkers by analyzing the entire gene expression profiles to approach the disease at a genome level. It is applied in the diagnosis of cancers such as cervical cancer, pancreatic cancer, breast cancer, lung cancer, and several other types. The advancement in bioinformatics technology has thus resulted in faster diagnosis, identification, and prevention of cancer, hence a sustainable solution has been revolutionized with this technology (13).