Cancer development is a multifactorial disorder rooted in environmental exposures such as diet, pollution, tobacco use, and physical activity, besides genetic predisposition. The current lack of complete treatments for the eradication of cancers has boosted the search for early markers or predictors of developing neoplasms in people.
Our study aimed to develop a Random Forest Classifier for predicting cancer in Iranian males and females using dietary habits. We used more than fifty lifestyle-related risk factors obtained from 252 participants. Although comprehensive data was collected, we decided to neglect some features in the model whose effects on cancer development have been proven for many years [20, 21], to focus and emphasize on probable hidden aspects that have not been modified yet despite their approved damaging role.
We found 4 risk factors that are strong predictors of cancer, the use of frozen vegetables, fried vegetables in the fridge, gold-fried potatoes, eggplants and squash, plastic bags to store bread, and plastic containers for conserving tomato juice and pickles.
According to the model explanation, vegetable storage in the freezer was distinguished as a key factor in cancer prediction. In comparison, no evidence suggests that frozen foods cause cancer. Indeed, refrigeration may even help reduce cancer risk by preventing the growth of carcinogen microorganisms [22]. Anyway, according to our proposed model, substituting frozen vegetables with dried ones could be a valuable piece of advice in protecting against cancers and going on a healthier diet. It can be explained by being willing to consume fewer fresh vegetables and salads when there are frozen kinds of stuff that are less in antioxidants content [23].
Frying, a popular food processing method, was shown to be carcinogenic in the present study. Abundant studies have confirmed that the production of cytotoxic, neurotoxic, and mutagenic complexes, such as malondialdehyde (MDA), a peroxidation product of frying oils, is a contributing factor [24]. Moreover, vegetable oils when frequently treated with high temperatures produce various byproducts, such as polycyclic aromatic hydrocarbons (PAHs), well-known to be poisonous and toxic [25]. Our findings align with existing data, indicating that an increased consumption of deep-fried foods increases the chance of cancer development.
Preserving bread in plastic bags between 4 top-ranked features grabs excellent attention. As a cultural habit in Iran, people put fresh and warm (even hot) bread in plastic bags and package it in these covers until they use it. Our model shows that this behavior is one of the most important features in cancer incidence while it is modifiable by encouraging people to use fabric or paper-based pockets to keep bread, especially when it is hot. Moreover, reusing plastic bags several times, which is common here, may signify the harmful effect of this kind of package. It is suggested that the chemicals in the food containers can transfer to the food and unintentionally become part of the human diet [26]. More than 148 chemicals have been recognized that possess toxicological and endocrine-disrupting characteristics, accumulate in the body, and are associated with cancer, including melamine, BisPhenol A, and DBP which are approved as a carcinogen in prostate and breast cancer [27, 28].
Moreover, in some cases, unique compounds with antimicrobial properties are added to the components of plastic containers to keep food or water fresher. Silver nanoparticles, as one of these compounds, migrate to food in an acidic environment, similar to the previously mentioned compounds [29]. However, not many studies have been conducted on the intensity and size of nano-silver particles transferred from container to food. Still, it can be considered as a potential issue in preserving acidic liquids such as vinegar or tomato juice in plastic containers.
Random Forest, a highly favored machine learning algorithm, shows strong performance, especially in datasets of smaller to medium sizes [30]. Significantly, in the context of this study, the algorithm of choice for automated feature selection is the random decision forest that effectively addresses the essential issue of feature selection, providing insights into the most pivotal features for optimizing prediction accuracy. Our results showed that 10-based features emerged as the best model as it gave 97% accuracy, 94% precision, and 97% AUC in predicting cancer which are high values.
However, the Random Forest model has been applied in various surveys to predict situations such as cancer, obesity, heart diseases, and stroke [31–34], to the best of our knowledge, no study investigated the factors we considered in cancer development through this model. The presented algorithm seems to be working as a prognosis support tool that aids clinicians in emphasizing factors that have not been given much attention before but are serious risk elements, according to our results. It is essential to mention that the present algorithm uses inexpensive and non-invasive procedures such as sociodemographic, lifestyle, and self-reported features, which makes its application easy.
The main limitation of our study is the sample size. However, we included all the eligible people with diagnosed cancer to enter our study, they did not exceed 126 subjects in the case group, and this points out that the results may not be generalizable. In addition, incomplete data on some questions forced us to omit them. Future studies with greater sample sizes and more complete answers to questionnaires will make the results more generalizable.