Cancer is clearly one of the most feared diseases in the world. According to the Centers for Disease Control and Prevention (CDC), cancer was the second leading cause of death in the United States in 2020, resulting in about 602,350 deaths (Centers for Desease Control and Prevention, 2022). Lung cancer, colon and rectum cancer, and pancreas cancer were the top three leading causes of cancer death, accounting for 23%, 9%, and 8% of the total cancer deaths, respectively. According to the American Cancer Society, brain and other nervous system cancers ranked among the top 11 causes of cancer-related deaths (American Cancer Society, 2021), Globally, brain and other central nervous system cancers account for about 3% of cancer, while occurring more frequently among men than women (Farmanfarma et al., 2019). According to the available estimates, 308,102 BCNS cancer cases were diagnosed globally in 2020, leading to 251,329 deaths (International Agency for Research on Cancer, 2020). In 2021, there were about 24,530 new brain and other nervous system cancer cases in the United States, and approximately 18,600 people lost their lives to the disease (National Cancer Institute, 2021).
The term “BCNS cancers” is used to describe various forms of cancers or tumors that grow in the brain or the spinal cord, which are often fatal due to their invasive nature and the tendency to be resistant to typical surgical procedures and therapies (Liu & Zong, 2012). Although brain and other central nervous system cancers are rare, they exert a significant social and economic impact on the affected individuals, their families, and the community (Australian Institute of Health and Welfare, 2017). In addition, brain and other central nervous system cancers pose a huge burden on healthcare systems due to their inherently disabling effects on the patients (GBD 2016 Brain and Other CNS Cancer Collaborators, 2019). Unfortunately, BCNS cancers can emerge at any age. Nonetheless, according to data obtained from the United States Central Brain Tumor Registry (dataset from the National Program of Cancer Registries [NPCR] and Surveillance, Epidemiology, and End Results [SEER] registries), malignant brain tumors are most prevalent among males and non-Hispanic White individuals, while benign brain tumors are most common among females and non-Hispanic Black individuals (Liu & Zong, 2012; Ostrom et al., 2019). Analysis of the SEER program data from 1978 through 2017 reveals that the BCNS cancer incidence has increased over this period, as shown in Fig. 1 (National Cancer Institute, 2021).
However, between 2008 and 2017, incidence rates for malignant BCNS cancers declined annually by 0.8% for all ages combined. Unfortunately, the rates observed among children and adolescents increased annually by 0.5–0.7% over this same period (Miller et al., 2021). Similarly, a significant increase in the number of primary brain and other central nervous system cancers was observed among the elderly over the past few decades (Maher & McKee, 2003). This upward trend has been particularly reflected in a higher burden of glioblastoma—the most common primary brain cancer. Over the past four decades, there have been a few significant advances in the prevention, early detection, and treatment of such cases. For example, the 5-year survival rate for glioblastoma patients increased from 4–7% (Miller et al., 2021).
An essential requirement for successful BCNS cancer treatment is the accurate diagnosis (Wong & Yip, 2018). As can be expected, one of the most difficult tasks in the field of medical image processing is brain tumor segmentation. This challenge arises due to the diverse appearance of these tumors and the similarity between cancerous and normal tissues (Hossain et al., 2019). Thus, it is essential to provide practitioners with tools that would allow them to extract relevant information from clinical data in order to develop treatment plans that are specific to a patient, rather than relying on personal experience, anecdotes, or population-wide risk assessments (Agrawal et al., 2012).
At present, magnetic resonance imaging (MRI) remains the most widely used diagnostic tool, even though interpreting MRI scans requires high expertise and may be affected by personal experience. Consequently, several authors have recommended the use of algorithms when interpreting imaging results (Upadhyay & Waldman, 2011; Dahab et al., 2012). For example, color-based segmentation using K-means clustering was shown to yield better results (Datta & Chakraborty, 2011).
Available studies focusing on disease conditions are typically classified under case-control and cohort studies. A case-control study provides additional information about a wide range of possible risk factors by comparing the histories of people who have and do not have brain tumors, whereas cohort studies compare the risks in people with and without certain characteristics. However, as brain tumors and other central nervous system cancers are relatively rare, most of the analytical studies have been case-control studies (Wrensch et al., 2002). Data from such sources are vital in the development of predictive models.
In recent decades, machine learning algorithms have gained considerable popularity among cancer researchers and practitioners. In studies based on historical datasets, many machine learning algorithms have been employed, including artificial neural networks (ANNs) (Delen, 2009), Bayesian networks (Choi et al., 2009), classification trees, k-nearest neighbors (KNN), and support vector machines (SVMs) (García-Laencina et al., 2015), in order to extract prognosis factors and predict patient survival. Using ANNs and logistic regression (LR) models, Lundin and colleagues were able to predict the 5-, 10-, and 15-year survival of patients with breast cancer with a high degree of accuracy (Lundin et al., 1999). Boughorbel et al. (2016) compared eight predictive models focusing on breast cancer survival over a period of two, five, eight, and eleven years. Using ANNs and LR, Simsek et al. explored the impact of several attributes that contribute to breast cancer survival over a period of one year, five years, and ten years (Simsek et al., 2020).
Although machine learning is increasingly utilized in cancer research, there are only a few articles that directly address machine learning algorithms for modeling BCNS cancer survival and the validation of those models. Most of the existing research in this domain focused on traditional statistical analysis with the aim of predicting patient survival or comparing the survival rates among BCNS patients (Fang et al., 2020; Sun et al., 2014; Zhu et al., 2020; Rosenberg et al., 2005). For example, Bohn et al. (2018) used the information retrieved from the National Cancer Institute's Surveillance Epidemiology and End Results (SEER) database for the 2010 − 2014 period to analyze the glioblastoma outcomes. To compare overall survival periods across race groups and the effect of race on 3-year survival, the authors used Kaplan-Meier and log-rank tests and Cox proportional hazards models. Furthermore, the models included attributes such as age, gender, health insurance coverage, primary site, tumor size, the extent of surgery, and year of diagnosis. Although health insurance coverage and year of diagnosis are associated with survival duration, these variables will not be helpful for practitioners assessing prognostic factors and selecting appropriate treatment options for their patients. An online calculator was developed by Senders et al. to predict survival in patients with glioblastoma using classical and machine learning algorithms (Senders et al., 2020). However, their models incorporated the attribute of Insurance to improve performance, which will not be helpful for practitioners or cancer researchers whose aim is to establish potential associations between cancer survivors.
To address the shortcomings of these investigations, in the present study, the temporal effects of attributes and levels of attributes regarding cancer survival are evaluated. Understanding the effect of predicted attributes over short- and long-term periods would be useful for estimating the contribution of various factors to BCNS cancer prognosis. To the best of our knowledge, the temporal effects of the clinical attributes on brain cancer survival have never been examined. A hybrid method based on clinical data is adopted in order to achieve the proposed goal by (a) applying balancing methods, (b) employing extended attribute selection methods, (c) conducting 5-fold stratified cross-validation, and (d) adopting two different types of classification models, Gradient Boosting (GB) and logistic regression (LR), to investigate the contributions of attributes to BCNS cancer survival over 1-, 5-, and 10-year periods.
The remainder of this paper is organized as follows. In Section 2, the overall analytic methodology used in this study, as well as the data source and preparation, are described. The results obtained from the actual data analysis are presented in Section 3. Finally, in Section 4, discussion and conclusions are provided and some thoughts on the direction of future research are offered.