Clustering of metabolic risk factors and cancer risk: a Self-Organizing Map approach

Background: We examined the synergistic effects of multiple metabolic risk factors (MRFs) and cancer risk among Iranian adults. Methods: Among 8593 (3929 men) participants aged ≥30 years, the Self-organizing map (SOM) was applied to clustering of four MRFs including high fasting plasma glucose (HFPG), high total cholesterol (HTC), high systolic blood pressure (HSBP) and high body mass index (HBMI). The Cox proportional hazards model was used to investigate the association between clusters with cancer incidence during a median of 14.0 years. Results: About 32 and 40% of men and women, respectively, had three or four MRFs. We identified seven clusters of MRFs in both men and women. In both genders, MRFs were clustered in those with older age. Further, inverse associations were found between smoking in men and education level in women and clustering of MRFs. In men, a cluster with 100% HSBP and HBMI had the highest risk for overall cancer. While, among women, a cluster with 100% HFPG and 93% HBMI yielded the highest risk for cancer. The risk was decreased when HBMI accompanied by HTC. Conclusions: Clustering patterns may reflect underlying link between MRFs and cancer and could potentially facilitate tailored health promotion interventions.

among them, HFPG and HBMI have been associated with increased risk of several of the more common cancers (4).
A large number of studies have repeatedly shown that some of the MRFs tend to cluster and co-occur in individuals (5,6), and clusters of these factors may have synergistic properties, such that the combined effect of these factors is much worse than the sum of each risk factor in isolation (7). Although growing evidence shows relation between a single MRF and risk of cancer (8)(9)(10), very little research has examined the association between clusters of MRFs and cancer incidence (7). Due to the synergistic effects of multiple MRFs, identifying clusters of these factors can help implementing multiple interventions at population level to reduce the risk of cancers (11). The purpose of the present study was therefore to identify: 1) distinct clusters of MRFs among Iranian adults, 2) the association between identified clusters and demographic, social and behavioral characteristics, and 3) the relation between different clusters of MRFs and incidence of cancers.
We applied the self-organizing map (SOM) (12) to identify different clusters of four MRFs including HFPG, HTC, HBMI, and HSBP among participants. SOM is a kind of neural network learning without a supervisor which is used for clustering and data visualization. SOM simplify complex and multidimensional data and have been used in a broad range of fields including medicine (6,13). To accomplish our objectives, we analyzed data from Tehran Lipid and Glucose Study (TLGS).

Study population
The TLGS is a large-scale, long-term and community-based prospective study performed on a representative sample of residents of district-13 of Tehran. The study is described in detail elsewhere (14). In brief, about 15000 individuals aged ≥3 years participated in the first phase of the study (1999)(2000)(2001)) and a total of 3550 new subjects were included in phase 2 (2002)(2003)(2004)(2005). Annual follow-up of the cohort for different outcomes began from entry to study until the end of the study (20 March 2014). For the current study, all subjects aged ≥30 years from the first and second phases (n=9553) were included.
Participants with previous cancer at baseline (n=52), without any follow-up data until the end of the study (n=893), and with survival time under 365 days (n=15) were excluded.
The remaining 8593 (3929 men) participants (90% of eligible sample) contributed until the censoring date, first cancer occurrence, or death from any cause ( Figure 1). The ethics committee of the Research Institute for Endocrine Sciences of Shahid Beheshti University of Medical Sciences approved the study and informed written consent was obtained from all participants.

Baseline data collection
Standardized interviewer-administered questionnaires were used to obtain data on demographic characteristics, smoking status, educational level, physical activity and prescription medications. Height and weight were measured in standardized ways and BMI was calculated as weight (kg)/height 2 (m 2 ). Sitting blood pressure (BP) was measured twice by trained technicians, at least 15 minutes apart, and the mean value was considered as the subject's BP. A single blood sample was drawn following an overnight fast to determine the FPG and TC following a standardized protocol (14). Physical activity level was assessed using the Lipid Research Clinic (15) questionnaire in the first phase of the study. It was substituted by the Modifiable Activity Questionnaire from the second phase for obtaining the quantitative measure of physical activity level (16).

Definition of confounders
Education attainment was categorized into three groups: 0-8 years, 9-11 years and over 12 years. Current smoker was considered as someone who is currently using cigarettes or other smoking implements daily, non-daily and occasionally. Former smokers were defined as subjects who have smoked daily or occasionally and, those who have quit smoking.
Never smokers were adults who reported not smoking any product during their entire life.
In first phase, low physical activity was defined as doing exercise or labor < three times a week, and in the second phase, it was defined as achieving a scores ≤600 MET (metabolic equivalent task)-minutes per week (17).

Follow-up and outcome classification
Subjects with no history of cancer were followed annually from baseline examinations until their first cancer diagnosis, death, emigration, or end of study (March 20, 2014), whichever occurred first. The main outcome was a composite of cancer types. Death certificates and the medical records of the hospitalized patients were used to supplement the information on cancer incidence. Only patients with pathologically proven cancer leading to hospitalization were enrolled. The collected information was then evaluated by an outcome committee consisting of an endocrinologist, an internist, a cardiologist, an epidemiologist, and other experts. The diagnoses of cancers, including histologic types, were coded according to the International Classification of Diseases 10th Revision. Time to event was defined as time of censoring or having the cancer, whichever occurred first. We censored individuals at the time of other causes of death, loss to follow up during the study period (who had at least one successful phone contact during the study period), and being in the study until 20 March 2014 without cancer occurrence.

Statistical methods
A chi-square goodness of fit test and independent samples t-test were used to compare the means of categorical and continuous variables, respectively. Incidence density rate of cancers were calculated by dividing the number of events by the person-years at risk.
Missing data (after applying the exclusion criteria) were 2.5, 1.4, 2.7 and 3.0% in men, and 3.7, 1.6, 6.0 and 5.8% in women for smoking status, BMI, TC and physical activity level, respectively. Therefore, the multivariate imputations by chained equations (MICE) (mice package in R software) (18) was implemented for handling missing data.
The SOM was applied to identify clusters of individuals with similar patterns of four MRFs separately for men and women. SOM is a nonparametric and unsupervised learning based on neural network technique which groups similar individuals based on multivariate distance and forms a low-dimensional map of training dataset. Typically, SOM consisted in neurons (units or cells) organized on a regular two-dimensional grid, usually represented as cells on hexagonal or rectangular lattice. Similar individuals in term of their characteristics are placed close together on the SOM grid; while, individuals far apart on the map are different from each other (12). For assessing cluster quality, we used the Silhouette width index that provides a comparison of the tightness of the groupings of subjects within each cluster to the separation between clusters. The value of this index is between (-1, 1). Silhouette width above 0.5 is considered as a reasonable clustering (19).
The multinomial logistic regression analysis were conducted to identify which factors (i.e. age, smoking status, educational level, and physical activity level) were associated with being a member of a cluster at baseline. Finally, one categorical variables with k level (k is the number of cluster) was specified and the association between cluster membership with the cancer incidence was assessed by the Cox proportional hazards model without and with adjusting for aforementioned baseline characteristics. The proportional hazard assumption was checked using statistical tests based on the scaled Schoenfeld residuals. All models were stratified by sex. Analyses were performed in the R statistical package, v.3.4.0 (www.r-project.org) using packages kohonen (13), mice (20), survival (21) and cluster (22). Two-tailed p <0.05 was considered significant.

Baseline characteristics
After exclusions, the study sample consisted of 8593 total individuals (3929 men and 4664 women) aged ≥30 years. The mean (standard deviation (SD)) age at cohort entry were 48.1(13.1) and 46.5(11.7) years for men and women, respectively. During a median of tumors of haematopoietic and lymphoid tissues (16%), male genital system (15%), urinary system (14%) and respiratory system (6%). Tumors of the female genital system (46%), digestive system (24%) and haematopoietic and lymphoid tissues (7%) were the three most common cancer site groups among females. Table 1 shows the baseline characteristics of the participants. In both genders, non-cancer subjects were younger and less educated. In addition, they had lower mean of SBP in both genders. About 12.5% of men had no MRFs, and 32.3% of them had three or four MRFs.
The corresponding values were 8.6% and 40.4% in women.
The frequency and percentage of different MRFs in population by each of the four MRFs are shown in Figure 2. For example, HBMI, HSBP and HTC were highly prevalent (more than 60%) among men and women with HFPG ( Figure 2).

Number of clusters
We obtained seven apparent clusters both in men and women by the SOM map of size 7×1 with a hexagonal topology. The average silhouette width of all the clustered objects was 0.61 and 0.74 for men and women, respectively, which show a reasonable and strong clustering structure was found in men and women, respectively (Figures 3 and 4). Table 2 shows the characteristics of the clusters in male population. Cluster 1, including 17.1% of male population with mean age of 53.5 years, had the highest mean number of MRFs (3.4). All population (100%) in this cluster had HTC and HFPG. Also, 54 and 34.5% of men in this cluster had four and three MRFs, respectively. Cluster 7, with the lowest mean number (0.4) of MRFs, described 21.4% of men with mean age of 43.6 years. About 41.6% of population in this cluster had only one MRF including HTC. The highest and lowest incidence rates of cancer (5.1 and 0.6 per 1000 person years) were found in cluster 3 and cluster 5, respectively. Table 3 presents the characteristics of the clusters of the MRFs in female population.

Description of the clusters in men and women
Cluster 7 contained 20.1% of women with mean age of 53.6 years, of whom 33% had four, and the remaining 67% had three MRFs. All women (100%) in this cluster had HFPG and HTC and most of them (90%) had HBMI. Cluster 1 contained a group of women with lowest mean number of MRFs (0.8). This cluster included 20.2% of women with mean age of 42.4 years, of whom 43% had no MRFS.

Sociodemographic predictors of cluster membership
Results of the multinomial logistic regression analysis in men showed that age, physical activity level and smoking status were significantly associated with cluster membership at baseline (Table 4). Considering cluster 7 (relatively normal group) as reference, a group of men with the low levels of physical activity were less likely than physically active participants to be in the cluster 3 [odds ratio (OR): 0.69(95% CI: 0.54-0.89)]. Older men were more likely than younger men to be in the clusters 1, 2, 3 and 4 compared with the  (Table 4).
In women, age, levels of education and physical activity and smoking status were significantly associated with cluster membership at baseline (Table 5). Older women were more likely than younger to be in the clusters 3, 4, 5, 6 and 7 compared with the cluster 1  (Table 5).

Cluster membership and incident cancers
Among men, individuals in cluster 5 had the lowest incidence rate of cancer (Table 2) during median follow-up of 13.9 years. The confounders adjusted risk was more than three times higher in cluster 3 (hazard ratio (HR): 3.56, 95% CI: 1.23-10.28) compared with cluster 5 (Table 6).
Among women, subjects in cluster 2 had the lowest incidence rate of cancer (Table 3) during median follow-up of 14.1 years. Women in cluster 6 had nearly four times higher adjusted risk of cancer than did women in cluster 2 (3.63, 1.46-8.99) ( Table 6).
Finally, we repeated the analysis with considering the clusters with highest incidence rate of cancer as reference groups. Accordingly, among men, only cluster 5 showed a lower risk of cancer (0.28, 0.09-0.80) compared with cluster 3 as reference. However, among women, all clusters except the cluster 1 had lower risk of cancer than did women in cluster 6 as reference (Supplementary Table 1).

Discussion
In our prospective study of 8593 Iranian adults with a median of 14 years follow-up, the prevalence and clustering of four major MRFs were identified. Moreover, sociodemographic determinants related to cluster membership were identified. We also investigated how different clusters of MRFs were associated with increases or decreases in cancer development. In both genders, seven distinct clusters of four MRFs were identified by SOM. These clusters differed substantially from each other in terms of total number of risk factors, the associations between identified clusters with four sociodemographic factors (age, educational level, physical activity level and smoking status) and incidence of composite of cancer types. Among men, cluster 3 including those with 100% HSBP had significantly greater risk of incident cancer compared with cluster 5 as the reference group. In females, cluster 6 including individuals with 100% HFPG had significantly higher risk of cancer than cluster 2 as the reference group.
The present study found that the presence of four MRFs, individually or in combination, is highly prevalent in Iranian adult population, as we have previously shown (6). About 88 and 91% of men and women, respectively, had at least one MRF, and 32% of men and 40% of women were found to engage in three or four MRFs.
A large number of studies have previously examined the association between MRFs and cancer incidence. However, they have focused on only one MRF (23,24) or pre-defined constellation of factors such as metabolic syndrome (7). In contrast, this study extracted different patterns of MRFs and their effect on cancer risk.

Clustering patterns in men
Among male participants, a relatively healthy subgroup (cluster 7) with the lowest number of MRFs was identified in which 41.6% of subjects had only one MRF. In particular, we identified two unhealthy subgroups (cluster 1 and 4), of whom 100% had at least two MRFs. Association analysis showed that each 1-year increment in age was associated with about 5% increase in chance of being in clusters 1 to 4, with high number of MRFs. It is assumed that aging is the result of the accumulation of multiple forms of damage and pathology in different tissues (25).
Surprisingly, we found that smoking decreased the chance of being in unhealthy clusters (clusters 1 to 6) compared to healthy cluster (cluster 7). In fact, our results suggest that smoking decreases the aggregation of MRFs. Our results confirm the findings of previous studies that suggest that smoking has a protective effect against some MRFs (26). The inverse association between smoking and clustering of MRFs might be attributable to diminished appetite, rise in metabolic rate and, as a result, lower measures of abdominal obesity and blood pressure among smokers (26).
Recent studies showed a convincing association between metabolic syndrome, as aggregation of 3 or more metabolic disorders, and certain types of cancer, including prostate (27) and breast (28). Interestingly, we did not find a clear relationship between the number of MRFs and cancer risk in men. For example, the highest incidence rate of cancer was observed in cluster 3, although they had fewer MRFs than the cluster 1 and 4.
Also, individuals in cluster 7 had the lowest number of MRFs, however, they had higher incidence rate of cancer compared with cluster 5 (1.7 vs. 0.6). Multivariate Cox regression analyses revealed that the cluster 3 (with highest incidence rate) had more than 3.5-fold the adjusted cancer risk of the cluster 5 (with the lowest incidence rate). In complementary analysis (Supplementary Table 1), individuals in cluster 5 had 72% lower risk of cancer after adjustment for confounders compared with cluster 3. Several interesting findings emerge from the patterns identified among men. Firstly, comparison of patterns in clusters 3, 5, 6 and 7 suggests that combined effects of HBMI and HTC (cluster 5) has a more protective effect against cancer risk than individual effects of these two risk factors (cluster 6 and 7). Many studies have investigated the individual effects of BMI and TC on the risk of incident cancer, but research findings have been inconsistent. In a large pooled cohort of Australian adults, BMI was associated with the development of overall, colorectal and obesity-related cancers in men (29). Also, several studies have indicated that BMI was positively associated with prostate cancer risk (30). A Chinese cohort study, conducted on 133273 subjects, showed that the association between BMI and cancer incidence varied by cancer site. Among men, underweight (BMI<18.5 kg/m 2 ) increased the risk of gastric and liver cancer, and obesity (BMI≥28.0 kg/m 2 ) increased the risk of colon cancer. However, overweight (BMI 24-28 kg/m 2 ) showed a protective role in lung and bladder cancer incidence in males (31). Several biological mechanisms have been suggested for the association between HBMI and risk of various cancers. They include obesity-related hormones, growth factors, modulation of energy balance and calorie restriction, multiple signaling pathways, and inflammatory processes affecting cancer cell promotion and progression (32).
In recent years HTC has been linked to the development of several different cancers although the results are inconsistent. A number of studies have reported a positive association between TC and cancers (23,33). However, others found lower overall or sitespecific cancer incidence in people with high TC levels (24,34). A Korean cohort study showed that a high TC level (≥240 mg/dL) was negatively associated with risk of liver, stomach cancer in both men and women, and lung cancer in men (24). It has been suggested that the observed inverse associations between TC levels and cancer risk is effect of preclinical cancer or disease due to an increased uptake of cholesterol by tumor cells rather than reflecting a true causal relationship on cholesterol levels (35). In the present study, we found that co-occurrence of HBMI and HTC put men to a lower risk of cancer compared with the occurrence of the individual factors alone.
Another interesting finding emerges from the comparison of four clusters 1, 2, 3 and 4 with three other clusters (5, 6 and 7); the highest incidence of cancer was observed in clusters at which all or most of the subjects had HSBP (clusters 1 to 4). Also the identified patterns in clusters 3 and 4 suggest that HTC may modifies the adverse effect of HSBP; because the risks of cancer were not significantly different between clusters 3 and 4, despite of 100% prevalence of HSBP and HTC in cluster 4. In some studies arterial hypertension was associated with a higher risk of colorectal (36), prostate (7) cancer and malignant melanoma (37). Also, the arterial hypertension was found to be closely linked with renal cell cancer development (38). There are many uncertainties regarding a possible relation between hypertension and cancer, mainly concerning cancer site specificity, sex, age and duration of the disease, and also complex interactions with other factors, such as smoking, BMI, diabetes, alcohol and diet (39). According to our analysis, the combination of HSBP and HBMI could be conceptualized as a very high risk patterns for overall cancer incidence among Iranian men. From a public health perspective these results are important due to the high prevalence of hypertension and obesity among Iranian population (6, 40).

Clustering patterns in Women
Among females, cluster 1 was found to be relatively healthier than the others, with the lowest mean number of MRFs (0.8). About 43% of women in this cluster had no MRFs. Furthermore, we found two clusters with multiple MRFs (clusters 5 and 7), of whom 100% had at least two MRFs. Association analysis showed a positive relation between aging and clustering of MRFs, similar to association was found in men. In addition, moderate education decreased the chance of being in unhealthy clusters (clusters 5, 6 and 7) compared with high education. This finding may be attributable to sedentary life style among high educated women. One study reported that those with high education had lower total physical activity than those with moderate education (41). Interestingly, we found that smoking decreased the chance of being in cluster 5, in which all individuals had HBMI, HSBP and HTC. This suggests the protective effect of smoking on some MRFs (26), as we discussed in previous section.
In women, the highest incidence rate of cancer was observed in cluster 6, although the number of MRFs was relatively smaller than the cluster 5 and 7. Thus, unlike some previous studies (28), our finding did not show a clear relationship between the number of MRFs and cancer risk in women. The results of multivariate Cox regression showed that cluster 6 (with highest incidence rate) had about 3.6-fold increased risk for cancer compared with cluster 2 (with the lowest incidence rate). Furthermore, cluster 1 had about 2.2-fold increased risk compared with cluster 2 (marginally significant). In complementary analysis (Supplementary Table 1), we found that all clusters, except cluster 1, had significantly lower adjusted risk of cancer compared with cluster 6. Some important conclusions emerge from these findings; firstly, healthy overweight or obese women (cluster 2) showed the lowest overall cancer risk.
Evidence has suggested that BMI is an important predictor of cancer risk (42): a population-based cohort study of 5.24 million UK adults showed associations between increased BMI and certain types of cancer (43). In a meta-analysis of 221 datasets, positive associations were reported between HBMI and cancers of the oesophagus, thyroid, colon, kidneys, endometrium, and gallbladder; in contrast, increased BMI was negatively associated with lung cancer (44).
The relation between BMI and cancer are complex and are not yet fully understood. For example, some studies have shown that increased BMI is associated with an increased risk of breast cancer in women after menopause (45); however, a meta-analysis showed that BMI had no significant effect on the incidence of breast cancer during premenopausal period (46). In our study, the lowest incidence of cancer in cluster 2 may be due to the age, as this cluster was the youngest group (mean age of 38 years) among 7 clusters.

Very few studies have examined the effects of various combinations of BMI and other
MRFs on cancer risk. Our study showed that healthy overweight/obese women (cluster 2), had lowest risk for incidence of cancer but the risk significantly increased only when HFPG is added to HBMI (clusters 6 and 1). All individuals (100%) and 7.7% of individuals in cluster 6 and cluster 1, respectively, had HFPG; in contrast nobody had HFPG in cluster 2, reinforcing the positive association between HFPG and cancer risk.
While many observational studies suggest that people with pre-diabetes and diabetes are at a significantly higher risk of some types of cancer (47), but the links between them are incompletely understood. A prospective cohort study of 1 298385 Koreans (468615 women) aged 30 to 95 years reported significant positive associations between fasting serum glucose and cancers of the liver and cervix in women (48). One cohort study in Scotland showed significantly increased risks for pancreatic, liver and colon cancer in all population, while, no significant association was found between diabetes and overall cancer (49). In conclusion, pre-diabetes/diabetes and cancer have a complex relationship that requires more clinical attention and better-designed studies.
Interestingly, all individuals in cluster 7 had also HFPG; however, no statistically significant differences in risk of cancer were observed between cluster 7 and cluster 2.
Unlike cluster 2, all individuals in cluster 7 had HTC which suggests the protective role of HTC against cancer risk, as we have discussed in previous section.
Several limitations should be noted. First, MRFs were measured only once at cohort entry, so we were unable to assess changes over time. Second, due to the small number of cancer cases, we were unable to stratified results by cancer site. Third, the Iranian background of study participants may limit the generalizability of our findings to more diverse ethnic groups of population. Strengths of our study include its long duration of follow-up and population-based sample. Also, a comprehensive physical exam and questionnaire were completed at cohort entry, and complete and reliable outcome data obtained through outcome committee team. This is the first study that identified multiple clusters of MRFs using SOM in a well-characterized cohort of Iranian population.

Conclusions
Clustering of MRFs is common in Iran. The majority of men had more than one metabolic risk factor. Multiple modifiable factors such as educational level, physical activity level and smoking are responsible for the clustering of MRFs. In general, a gradient between the number of MRFs and cancer risk was not observed in both men and women. Instead, some combinations of four MRFs were significantly associated with an increased risk of overall cancer. Our study shows that co-occurrence of HBMI and HFPG in women, and HSBP and HBMI in men are powerful indicators of overall cancer. However, HBMI in combination with HTC has a protective effect against cancer development in both genders. These findings suggest that the combined information from a few variables related to cancer development is superior to measurement of only one metabolic risk factor. The cumulative and clustered nature of MRFs helps identify potential mechanisms of cancer and modifiable factors that can serve as important ways for intervention initiatives.

Ethics approval and consent to participate
This study was approved by the ethical committee of the Research Institute for Endocrine Sciences of Shahid Beheshti University of Medical Sciences, Tehran, Iran, and informed written consent was obtained from all participants.

Consent for publication
Not applicable.

Availability of data and material
The datasets analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
This study was conducted in the framework of the Tehran Lipid and Glucose Study (TLGS) and was supported by grant No. 121 from the National Research Council of the Islamic Republic of Iran. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.    Results obtained from the multinomial logistic regression. Cluster 7 with the lowest mean number of MRFs (Table 2) was considered as reference group in multinomial regression analysis. MRFS: metabolic risk factors; OR: odds ratio; CI: confidence interval Results obtained from the multinomial logistic regression. Cluster 1 with the lowest mean number of MRFs (Table 3) was considered as reference group in multinomial regression analysis. MRFS: metabolic risk factors; OR: odds ratio; CI: confidence interval In the Cox regression models cluster 5 and cluster 2 in men and women, respectively, with the lowest incidence rates of can (Table 2 and 3) were considered as reference groups. CI: confidence interval; HR: hazard ratio supp-table-1.docx