Supervised classification of genes expression discriminates diabetic patients from healthy ones
In this study, gene expression data from newly diagnosed type 2 diabetic patients analyzed using supervised and unsupervised machine learning approaches. At the supervised level, we aimed to identify a set of genes whose expression in skeletal muscles was dysregulated in most patients and could potentially discriminate normoglycemic from type 2 diabetic individuals.
The gene set comprises genes such as FGFBP3, CERK, ETV5, E2F8, MAFB, and non-coding RNAs, which may be used to study and development of treatment strategies in the future. Noticeably, the injection of FGFBP3 has been patented as a treatment for diabetes, obesity, and nonalcoholic fatty liver disease [23, 24]. This invention demonstrated that the administration of FGFBP3 with the single injection of this protein could regulate blood glucose level and keep it at the healthy stage for more than 24 hours. CERK plays an important role in inflammation-associated diseases . It was observed that CERK deficiency in CERK‐null mice suppressed the elevation of obesity-mediated inflammatory cytokines, improve high-fat-diet-induced decrease of the insulin receptor, GLUT4, and adiponectin and improves glucose intolerance . Studies also indicate the relationship between diet and obesity and ETV5 gene expression, which participates in food intake control mechanisms . Moreover, it has been found that impaired glucose tolerance in obese individuals associates with the up-regulation of E2F8, and therefore this gene possibly implicated in the progression of obesity, glucose intolerance, and its complications . MAFB also links to the metabolism and development of obesity and diabetes. The MAFB‐deficient mice exhibited higher body weights and faster rates of body weight increase than control mice . Up-regulation of MAFB expression in human adipocytes is correlated with adverse metabolic features and inflammation, which may lead to the development of insulin resistance . In addition to the protein-encoding genes, we found that about 40 percent of top-ranked genes comprises of the non-coding RNAs including pseudogenes and long non-coding RNAs. Recent studies have revealed that the deregulation of pseudogenes and lncRNAs can relate to diabetes [31, 32]. Here, in this analysis, more non-coding candidates found that strengthening the role of lncRNAs in complex diseases like diabetes. These non-coding RNAs can be functionally analyzed to understand their biological roles in the pathology of T2DM.
Unsupervised classification of the gene expression profile of diabetic patients reveals the potential existence of molecular subtypes
The objective of analysis at the unsupervised level was to identify different gene expression patterns among T2DM patients, which potentially lead to insulin resistance through different mechanisms. In this part, the diabetic samples were categorized into three clusters, and specific dysregulated genes and pathways in each cluster were reported. The purpose of this analysis was to show that because of the heterogeneous and multifactorial nature of this disease, the gene expression dysregulations of all diabetic people are also not necessarily the same. Thus, people can be clustered into different subgroups with different dysregulation in gene expression patterns. We attempted then to model the subsequent effects of these gene expression dysregulations on their metabolism. Also, we did not claim these transcriptional differences lead to the manifestation of different clinical features such as fasting glucose and insulin levels in these clusters. Moreover, we only investigated the potential existence of molecular subtypes in T2DM and we did not introduce certain subtypes, as accurate subtyping requires more data from more individuals and validation with independent data set and experimental verification.
Cluster 1: mitochondrial dysfunction, oxidative stress, and inflammation
In cluster 1, perturbed pathways and dysregulated genes possibly represent perturbation of lipid and free fatty acids (FFAs) metabolism, inflammation, oxidative stress, and mitochondrial dysfunction. Perturbed pentose phosphate, folate metabolism, and glutathione metabolism as well as dysregulated genes such as IGHA1 and IGHA2, GADD45G, and DDIT4 exhibit inflammation and oxidative stress. The up-regulation of IGHA1 and IGHA2 may start an inflammatory cascade involving a neutrophilic response, phagocytosis, the oxidative burst, and subsequent tissue damage. Also, GADD45G plays roles as stress sensors  which are overexpressed in this group. DNA damage and energy stress also can activate DDIT4 expression thus this gene implicates in the regulation of reactive oxygen species . Oxidative stress may impair mitochondrial function, which possibly leads to impairment of insulin sensitivity. Some evidence supports the role of oxidative stress and mitochondrial dysfunction in the pathogenesis of insulin resistance and type 2 diabetes . In diabetes mellitus, mitochondria are the major source of oxidative stress . Free radicals can damage lipids, proteins, and DNA and play a role in diabetes complications. Down-regulated mitochondrial genes and lowered flux in oxidative phosphorylation may demonstrate mitochondrial dysfunction in this cluster. Furthermore, MIF, which is a proinflammatory cytokine, is up-regulated in this cluster. A positive association between MIF plasma levels with FFAs concentration and insulin resistance is shown . The perturbation of FFAs metabolism, which leads to an increase in FFAs, was observed in this cluster. Evidence showed that FFAs could induce insulin resistance in skeletal muscle. FFAs may induce insulin resistance via mitochondrial dysfunction, increased ROS production and oxidative stress, and activation of inflammatory signals, which was observed in this cluster . Increase in FFAs associated with a decrease in adiponectin. ADIPOQ is mainly known as the adipokine but the importance of adiponectin production in muscle cells was also demonstrated . This study also reported an increased level of adiponectin expression in response to rosiglitazone treatment in muscle cells and confirmed the functional role of muscle adiponectin in insulin sensitivity. Adiponectin contributes to the glucose metabolism of muscle cells via increased insulin-induced serine phosphorylation of protein kinase B and inhibition of the inflammatory response . Moreover, in this cluster, abnormalities in inositol phosphate metabolism with Myo-inositol deficiency was observed. Myo-inositol, one of the inositol isomers, participates in signal transduction and vesicle trafficking and associated with glucose utilization. Clinical reports suggest that the administration of inositol supplements is a therapeutic approach in insulin resistance and improves glucose metabolism . Figure S4 in Additional file 1 shows the overview of abnormalities in this cluster.
Cluster 2: ER-stress and inflammation
Surprisingly, no significant dysregulated pathway found in the second cluster. Therefore, we compared the phenotypic features of people in each cluster with healthy individuals. It was interesting that this cluster is very close to a healthy state in terms of blood glucose and insulin levels. Therefore, people of this group possibly are at the early stage of diabetes onset, and there is still no apparent change in their metabolism. However, using differential gene expression analysis, the change in expression of non-metabolic genes like as overexpression of OPN, OPG, CHAC1, ERN1, and down-regulation of SERCA1 were observed in this cluster. These genes are related to diabetes by promoting ER-stress and inflammation. OPN and OPG play roles in inflammation, insulin resistance, prediabetes, and diabetes. A recent study demonstrated that OPN and OPG levels in pre-diabetic subjects are increased and alterations in OPN and OPG might be involved in the pathogenesis of prediabetes and T2DM [41, 42]. Obese mice lacking osteopontin improved whole-body glucose tolerance and insulin resistance also with decreased markers of inflammation . In addition, ER-stress can induce the expression of OPN and OPG. Recent pieces of evidence support the presence and role of ER stress in muscle [44-46]. In this cluster, SERCA1, which is an intracellular membrane-bound Ca2+-transport ATPase enzyme encoded by the ATP2A1 gene was down-regulated. The Dysregulation of SERCA promotes ER Stress . SERCA1 resides in the sarcoplasmic or endoplasmic reticula of muscle cells and contributes to the modulation of cellular Ca2+ homeostasis within the physiological range. Lower SERCA expression may lead to reduced Ca2+ accumulation in the ER lumen and ER dysfunction. High luminal calcium concentration is essential for proper protein folding and processing and Ca2+ depletion can result in the accumulation of unfolded proteins and trigger the unfolded protein response (UPR) and cell death . High-fat diet and obesity induce ER stress in muscles and subsequently suppresses insulin signaling . Antidiabetic compounds such as azoramide and rosiglitazone, demonstrated to induce SERCA expression and increased ER Ca2+ accumulation [49, 50]. Schematic representation of abnormalities in cluster 2 is shown in Figure S5 of Additional file 1.
Cluster 3: perturbation in IRS-mediated insulin signaling
In cluster 3, the differential gene expression analysis revealed the perturbation in insulin signaling and inflammation. Results showed down-regulation of insulin-responsive genes, HK2, EGR1, and CIDEC, which verify insulin resistance through deficiency of insulin signaling. Furthermore, overexpression of MSTN and ERBB3was found. Myostatin induces insulin resistance by degrading IRS1 protein  and diminishing insulin‐induced IRS1 tyrosine phosphorylation, thus interrupting insulin signaling cascade . In addition, treating Hella cells with myostatin suppressed hexokinase 2 expression . Evidence revealed that stress-induced transactivation of ERBB2/ERBB3 receptors triggers a PI3K cascade leading to the serine phosphorylation of IRS proteins [54, 55]. Overexpression of ERBB3 may enhance PI3K activity and implicating ERBB proteins in stress-induced insulin resistance. Taken together, MSTN and ERBB3 can lead to the serine phosphorylation of IRS, reducing tyrosine phosphorylation of IRS and degradation of them. Since expressions of insulin-regulated genes are positively correlated with insulin sensitivity, Down-regulation of HK2, EGR1, and CIDEC genes in this group possibly verify insulin resistance through deficiency of insulin signaling. In addition, at the metabolic analysis, lower phosphorylation of glucose with the subsequent lowering in glycolysis and TCA fluxes was observed. Moreover, dysmetabolism of branched-chain amino acids was observed at metabolic analysis. A proposed mechanism linking higher levels of BCAAs and T2DM involves leucine-mediated activation of the mammalian target of rapamycin complex 1 (mTORC1). This activation results in the serine phosphorylation of IRS1 and IRS2 and subsequent uncoupling of insulin signaling at an early stage . A brief representation of abnormalities in this cluster is shown in Figure S6 of Additional file 1.
The cluster-based study can improve understanding of T2DM
Our analysis showed that at the early stage of diabetes, associated changes at the gene expression level in skeletal muscle are low compared to healthy subjects. Moreover, the clustering of patients leads to the identification of abnormalities that are usually hidden in cohort studies. For example, dysregulation of genes such as MIF, ATP2A1, GADD45G, EEF2, EGR1, CIDEC, MSTN, and several other genes and pathways like as BCAAs metabolism, folate metabolism, and pentose phosphate observed in our cluster analysis whereas analysis in cohort study between normoglycemic and all diabetic individuals did not determine them as differentially expressed genes or dysregulated pathways. In a cohort study, a sample consists of several subjects (63 diabetic individuals in the original study) is gathered and examined (Figure S7 of Additional file 1). This makes it possible to see only an approximate average of the features in the samples and as a result, some of the abnormalities are covered in this way. In a cluster-based study, a collected sample in a cohort study is broken down into the sub-groups so that the members within each subgroup have the most similarity and differ from the members of the outer sub-groups. Then each sub-group will be analyzed individually (e.g. here we divided the diabetic group into three sub-groups). The cluster-based analysis in this study led to find more dysregulated genes and pathways that are specific in each cluster. Therefore, for a progressive and heterogenic disease like T2DM, applying a cluster-based study will enhance our understanding of the factors involved in the disease. Focusing on homogeneous sub-groups in a heterogenic disease such as T2DM may improve the success of therapeutic strategies.