Supervised classification discriminates diabetic patients from healthy ones
In this study, gene expression data from newly diagnosed type 2 diabetic patients were analyzed using supervised and unsupervised machine learning approaches. At the supervised level, we aimed to identify a set of genes whose expressions were dysregulated in most patients and could potentially discriminate normoglycemic from T2DM individuals.
The gene set comprised of genes such as FGFBP3, CERK, ETV5, E2F8, MAFB, and non-coding RNAs, which may be used to study and develop novel T2DM treatments in the future. Noticeably, the injection of FGFBP3 has been patented as a treatment for diabetes, obesity, and nonalcoholic fatty liver disease [23, 24]. It has been demonstrated that the single injection of FGFBP3 regulates blood glucose level and keeps it at the normal range for more than 24 hours. CERK plays an important role in inflammation-associated diseases . It has been observed that CERK deficiency in CERK‐null mice suppresses the elevation of obesity-mediated inflammatory cytokines and improves glucose intolerance . Studies also have indicated the relationship between diet and obesity and ETV5 gene expression, which participates in food intake control mechanisms .
Moreover, it has been found that impaired glucose tolerance in obese individuals is associated with the up-regulation of E2F8, which possibly is implicated in the progression of obesity, glucose intolerance, and its complications . MAFB also has been linked to the metabolism and development of obesity and diabetes. The MAFB‐deficient mice have exhibited higher body weights and a faster rate of increase in body weight than control mice . Up-regulation of MAFB expression in human adipocytes has been correlated with adverse metabolic features and inflammation, which may lead to the development of insulin resistance . In addition to the protein-encoding genes, we found that about 40 percent of top-ranked genes comprise non-coding RNAs, including pseudogenes and long non-coding RNAs. Recent studies have revealed that the deregulation of pseudogenes and lncRNAs can relate to diabetes [31, 32]. In the present analysis, more non-coding candidates were found that support the role of lncRNAs in complex diseases like diabetes. These non-coding RNAs can be functionally analyzed to understand their biological roles in the pathology of T2DM.
Unsupervised classification of diabetic patients reveals the potential existence of molecular subtypes
The objective of analysis at the unsupervised level was to identify different gene expression patterns among T2DM patients, potentially leading to insulin resistance through different mechanisms. In this part, the diabetic samples were categorized into three clusters, and specific dysregulated genes and pathways in each cluster were found. This analysis shows that because of the heterogeneous and multifactorial nature of this disease, the gene expression dysregulations of all diabetic people are not necessarily the same. Thus, people can be clustered into different subgroups with different dysregulations in gene expression patterns. We attempted to model the subsequent effects of these gene expression dysregulations on their metabolisms. Although, we did not claim these transcriptional differences lead to the manifestation of different clinical features such as fasting glucose and insulin levels in these clusters. Moreover, we only investigated the potential existence of molecular subtypes in T2DM, and we did not introduce specific subtypes. Accurate subtyping requires more data from additional individuals and validation with an independent data set and experimental verification.
Cluster 1: Mitochondrial dysfunction, oxidative stress, and inflammation
In cluster 1, perturbed pathways and dysregulated genes possibly represent perturbation of lipid and free fatty acids (FFAs) metabolism, inflammation, oxidative stress, and mitochondrial dysfunction. Perturbed pentose phosphate, folate metabolism, and glutathione metabolism as well as dysregulated genes such as IGHA1 and IGHA2, GADD45G, and DDIT4 exhibit inflammation and oxidative stress. The up-regulation of IGHA1 and IGHA2 may trigger an inflammatory cascade involving a neutrophilic response, phagocytosis, the oxidative burst, and subsequent tissue damage. Also, GADD45G plays the role of a stress sensor  which is overexpressed in this group. DNA damage and energy stress can also activate DDIT4 expression; thus, this gene contributes to regulating reactive oxygen species . Oxidative stress may impair mitochondrial function, which possibly leads to impairment of insulin sensitivity. Some evidence has supported the role of oxidative stress and mitochondrial dysfunction in the pathogenesis of insulin resistance and type 2 diabetes . In diabetes mellitus, mitochondria are the major source of oxidative stress . Free radicals can damage lipids, proteins, and DNA and play a role in diabetes complications. Down-regulated mitochondrial genes and perturbation in oxidative phosphorylation may demonstrate mitochondrial dysfunction in this cluster. Furthermore, MIF, which is a proinflammatory cytokine, is up-regulated in this cluster. A positive association has been reported between MIF plasma levels, FFAs concentration, and insulin resistance . The perturbation of FFAs metabolism that possibly leads to an increase in FFAs was observed in this cluster. Evidence has demonstrated that FFAs can induce insulin resistance in skeletal muscle. FFAs may induce insulin resistance via mitochondrial dysfunction, increased ROS production and oxidative stress, and activation of inflammatory signals, which was observed in this cluster . An increase in FFAs is associated with a decrease in adiponectin. ADIPOQ is mainly known as the adipokine, but the importance of adiponectin production in muscle cells has also been demonstrated . This study also has reported an increased expression of adiponectin in response to rosiglitazone treatment in muscle cells and has confirmed the functional role of muscle adiponectin in insulin sensitivity. Adiponectin contributes to the glucose metabolism of muscle cells via increased insulin-induced serine phosphorylation of protein kinase B and inhibition of the inflammatory response . Moreover, in this cluster, abnormalities in inositol phosphate metabolism with Myo-inositol deficiency was observed. Myo-inositol, one of the inositol isomers, participates in signal transduction and vesicle trafficking and associates with glucose utilization. Clinical reports have suggested that the administration of inositol supplements is a therapeutic approach in insulin resistance and improves glucose metabolism . Figure S4 in Additional file 1 shows the overview of abnormalities in this cluster.
Cluster 2: ER-stress and inflammation
Surprisingly, no significant dysregulated pathway found in the second cluster. Therefore, we compared the phenotypic features of people in each cluster with healthy individuals. It was interesting that this cluster is very similar to the healthy state in respect of blood glucose and insulin levels. Therefore, people at this group may be at the early stage of diabetes onset, and there is still no apparent change in their metabolism. However, using differential gene expression analysis, the changes in the expression of non-metabolic genes (e.g. overexpression of OPN, OPG, CHAC1, ERN1, and down-regulation of SERCA1) were observed in this cluster. These genes are related to diabetes by promoting ER-stress and inflammation. OPN and OPG play roles in inflammation, insulin resistance, prediabetes, and diabetes. A recent study has demonstrated that OPN and OPG levels in pre-diabetic subjects are increased, and alterations in OPN and OPG might be involved in the pathogenesis of prediabetes and T2DM [41, 42]. Obese mice lacking osteopontin have shown improved whole-body glucose tolerance and insulin resistance, also with decreased markers of inflammation . In addition, ER-stress can induce the expression of OPN and OPG. Recent pieces of evidence have supported the presence and role of ER stress in muscle [44-46]. In this cluster, SERCA1, which is an intracellular membrane-bound Ca2+-transport ATPase enzyme encoded by the ATP2A1 gene was down-regulated. The dysregulation of SERCA promotes ER Stress . SERCA1 resides in the sarcoplasmic or endoplasmic reticula of muscle cells and contributes to the modulation of cellular Ca2+ homeostasis within the physiological range. Lower SERCA expression may lead to reduced Ca2+ accumulation in the ER lumen and ER dysfunction. High luminal calcium concentration is essential for proper protein folding and processing. Ca2+ depletion can result in the accumulation of unfolded proteins and can trigger the unfolded protein response (UPR) and cell death . High-fat diet and obesity induce ER stress in muscles and subsequently suppress insulin signaling . Antidiabetic compounds such as azoramide and rosiglitazone, have been demonstrated to induce SERCA expression and increased accumulation of Ca2+ in ER [49, 50]. Schematic representation of abnormalities in cluster 2 is shown in Figure S5 of Additional file 1.
Cluster 3: Perturbation in IRS-mediated insulin signaling
In cluster 3, the differential gene expression analysis revealed the perturbation in insulin signaling and inflammation. Results showed down-regulation of insulin-responsive genes, HK2, EGR1, and CIDEC, which verify insulin resistance through deficiency of insulin signaling. Furthermore, overexpression of MSTN and ERBB3 was found. Myostatin has been shown to induce insulin resistance by degrading IRS1 proteins  and diminishing insulin‐induced IRS1 tyrosine phosphorylation, thus interrupting insulin signaling cascade . In addition, treating HeLa cells with myostatin has suppressed HK2 expression . Evidence has revealed that stress-induced transactivation of ERBB2/ERBB3 receptors triggers a PI3K cascade leading to the serine phosphorylation of IRS proteins [54, 55]. Overexpression of ERBB3 may enhance PI3K activity and implicating ERBB proteins in stress-induced insulin resistance. Taken together, MSTN and ERBB3 can lead to serine phosphorylation of IRS, reducing tyrosine phosphorylation of IRS and degradation of them. Since expressions of insulin-regulated genes are positively correlated with insulin sensitivity, down-regulation of HK2, EGR1, and CIDEC genes in this group possibly verify insulin resistance through deficiency of insulin signaling. In addition, at the metabolic analysis, lower phosphorylation of glucose with the subsequent perturbation in glycolysis and TCA pathways was observed. Moreover, dysmetabolism of branched-chain amino acids was observed at metabolic analysis. A mechanism involved leucine-mediated activation of the mammalian target of rapamycin complex 1 (mTORC1) has been proposed to link higher levels of BCAAs and T2DM. . This activation results in the serine phosphorylation of IRS1 and IRS2 and subsequent uncoupling of insulin signaling at an early stage. A brief representation of abnormalities in this cluster is shown in Figure S6 of Additional file 1.
The cluster-based study can improve understanding of T2DM
Our analysis showed that at the early stage of diabetes, associated changes at the gene expression level in skeletal muscle are low, compared to healthy subjects. Moreover, the clustering of patients leads to the identification of the abnormalities that are usually hidden in cohort studies. For example, dysregulation of genes such as MIF, ATP2A1, GADD45G, EEF2, EGR1, CIDEC, and MSTN, and perturbations in several reactions implicated in BCAAs metabolism, folate metabolism, and pentose phosphate were only observed in our cluster-based analysis. In a cohort study, a sample consists of several subjects is gathered and is examined (Figure S7 of Additional file 1). This makes it possible to see only an approximate average of the features in the samples and as a result, some of the abnormalities are covered in this way. In a cluster-based study, a collected sample in a cohort study is broken down into the sub-groups so that the members within each subgroup have the most similarity and differ from the members of the outer sub-groups. Each sub-group will be analyzed individually (e.g., here we divided the diabetic group into three sub-groups). The cluster-based analysis in this study led to find more dysregulated genes and pathways that are specific in each cluster. Therefore, for a progressive and heterogenic disease like T2DM, applying a cluster-based study will enhance our understanding of the factors involved in the disease pathogenesis. Focusing on homogeneous sub-groups in a heterogenic disease such as T2DM may improve the success of therapeutic strategies.