Long non-coding RNA plays a significant role in cancer. lncRNAs are essential transcriptional and post-transcriptional regulators, controlling the gene expressions. Cancers caused as a result of genetic alterations are often the genesis of aberrant gene expression. Human Papillomavirus (HPV) are small DNA viruses affecting numerous epithelial tissues at various sites in the human body. HPV is considered to be the leading cause of oral carcinogenesis. In a review by Langfield and Laimins [1], they highlighted the potential priorities of research in HPV-mediated oral cancer. The paper mentions that understanding the mechanisms by which productive HPV infections can progress to cancer, especially the interaction of the virus to the host’s chromatin remodeling, DNA repair, and differentiation pathways. Moving on to oral cancer, which is a subtype of head and neck cancer, 90% of the cases are oral squamous cell carcinoma. A study by Alexandra Iulia Irimie et al [2], provides a perspective of ncRNA and its derivatives. Another paper by Gibb et al [3] documents the first evaluation of the lncRNA expression profile for oral mucosa.
This brings us to gingivobuccal complex cancer which is quite predominant in countries where tobacco use is rampant. Very little literature offers us the perspective of lncRNA expression profiles relating to gingivobuccal cancer, most of which are quite outdated. Thus a need to properly identify and validate the expression of lncRNA in the gingivobuccal complex gains precedence.
Despite the glaring evidence of lncRNA playing a critical role in the biological processes of human diseases, very few efforts have been made to identify their association in terms of identifying major lncRNAs associated with the disease and assessing their prognostic values. In 2014, Steven B Cogill and Liangjiang Wang [4] published an article highlighting gene co-expression relational analysis for the identification and annotation of long noncoding RNAs (lncRNAs) and thus verifying their relation to the cancer disease. To achieve their goal, they used the weighted gene co-expression network analysis (WGCNA) method, which yielded hub lncRNA genes and enriched functional annotation terms within the modules. Recently another paper by Shervin Alaei et al [5], used similar network construction and module detection with the help of WGCNA to identify novel key regulators in esophageal squamous cell carcinoma. They performed gene ontology and pathway enrichment analysis. This proved to be beneficial for estimating the biological processes or pathways that the lncRNAs co-expressed.
Long intergenic noncoding RNA or lincRNA are long RNA transcripts found in genomes of mammals via analysis of transcriptomic data. They share features with the transcripts of long noncoding RNA however long noncoding RNAs are described as molecules, whereas lincRNAs are described as transcripts. Over the years, it has been clarified that lincRNAs actually contain most of the lncRNAs within them. A paper by Cabili et al. [6] confirmed the fact that lincRNAs are the largest subtype of non-coding RNA molecules in the human genome. The fact that lincRNAs are mainly composed of lncRNAs suggests the possibility that they are also involved in the prognosis of various cancers. Danish Memon et al 2019 [7] published an article correlating the implications of lincRNA on lung cancer cell survival. This potentially paves the way for future prospects as the effects of lncRNA on other cancers as well as other lincRNAs are relatively unknown.
In a review article by Claire Jean Quartier et al [8], in silico methods involved in cancer research have been highlighted. The article goes on to explain the importance of the TCGA database, and methods pertaining to computational validation, classification, and prediction using mathematical and statistical analysis.
Cancer recurrence is a major problem affecting patients who have been affected by the disease. In oral cancer, the case is no different. Surgery has been the preferred treatment for both cases. Even with the advancements in chemotherapy and radiotherapy, the treatment remains poor due to the local invasion and metastasis, which leads to recurrence. With a 30% survival rate of patients with recurrent oral cancer, the need to identify factors that may be able to identify the factors responsible for recurrence becomes crucial. This study aims at identifying biomarkers that would help improve the prognosis prediction in specific types of oral cancer [9], [10].
Transcriptomics in cancer study
The advent of next-generation sequencing brought along with it the means to analyze gene expression and correlate their prevalence with several diseases, more specifically cancer. Before its invention, basic gene expression analysis techniques did exist, however for rudimentary purposes. Alizadeh et al in 2000 [11] documented one of the first major studies to use gene expression analysis in the diagnosis of diffuse large B-cell lymphoma, thus making it one of the first studies to use transcriptomics to diagnose cancer.
Subsequent studies have thereafter been done profiling various gene expression studies involving protein-coding as well as non-protein-coding genes. Over the years, researchers considered protein-coding genes to be useful data and non-protein-coding genes were treated as junk data, due to their inability to code for proteins. However, it was later discovered that they contain invaluable information related to various regulations and pathways involved in a variety of diseases, one of the most notorious being cancer. In oral cancer specifically, numerous studies have been done concerning noncoding genes and their implications for the disease. The invention of NGS methodologies has certainly made it feasible to perform such analysis without significant problems. The first generation of NGS technologies involved Sanger sequencing, which was developed in 1977, based on the principle of the chain termination method or the dideoxynucleotide method. The second generation of NGS included the microarray and RNA sequencing techniques, both of which were considered significant advancements over its predecessor and slowly rendered Sanger sequencing obsolete barring a few specific exception cases. However, we are now entering the age of EPB41L4A-AS2 of third-generation sequencing, which is still in its infancy, but it is on its way to toppling its predecessors [12].
Moving back to second-generation NGS methodologies, it involves microarray and RNA sequencing techniques which are powered by Illumina/Solexa sequencing, which depends upon sequencing by synthesis approach and is currently one of the most used platforms for NGS. After which a huge amount of raw data is generated which contains valuable information about the genome or transcriptome. These data are then analyzed with the help of subsequent bioinformatic tools [12].