Identication of hub-genes, lncRNAs, and skin cancer inducer genes through meta-analysis of Hidradenitis Suppurativa disease transcriptome data.

This research is aimed to explore the molecular response of the skin cells to Hidradenitis Suppurativa (HS). Microarray transcriptome data for patients with HS, merged and employed to identify the differentially expressed genes (DEGs), gene ontology (GO), and long non-coding RNAs (lncRNAs). A protein-protein interaction network, and consequently, a co-expression network, was constructed for the essential genes. The key genes' clinical relevance was also established by survival analysis, relative expression level, immunohistochemistry, and immune inltration correlation analysis. Finally, potential herbal ingredients with inhibitory effects on cancer inducer proteins were characterized using the Traditional Chinese Medicine (TCMD) database. The chemokine signaling pathway was the gene ontology of the most signicant cluster found by clusterONE. Myocardial Infarction Associated Transcript (MIAT) and RNA Polymerase II Subunit J4, Pseudogene (POLR2J4) are the predicted lncRNAs for up-and down-regulated genes. Interleukin 6 (IL6), Formyl Peptide Receptor 2 (FPR2), C-X-C Motif Chemokine Ligand 10 (CXCL10), C-C Motif Chemokine Receptor 7 (CCR7), and C-C Motif Chemokine Ligand 5 (CCL5) genes were predicted as hub-genes. Tryptophan 2,3-Dioxygenase (TDO2), Serpin Family B Member 4 (SERPINB4), and Matrix Metallopeptidase 3 (MMP3) were demonstrated to be potential cancer inducers due to their high expression level in HS disease. These genes also were positively correlated with dendritic cells, T cell CD4+, monocytes, and neutrophils in the skin cancer microenvironment. Piceatannol, Salicylic acid, and magnolol herbal ingredients were predicted as potential compounds with inhibitory effects on SERPIN B4, MMP3, and TDO2, respectively. The output of the present study will aid in a better understanding of the HS disease and consequent cancer induction mechanism.


Introduction
Hidradenitis suppurative (HS) is a skin disease characterized by painful nodules, abscesses, and draining sinus tracts in the intertriginous area [1,2]. Patients' quality of life is impaired by this skin condition, and its high frequency results in exorbitant costs [3]. The assumption that microorganisms or skin infections are the rst causes of HS has been debunked [4]. The existence of follicular hyperkeratosis has been validated in previous articles to establish the molecular mechanism of HS, and perifolliculitis is considered the earliest occurrence detected in the HS condition [5]. A rupture of the follicular epithelium is the initial in ammatory symptom in HS, followed by a stream of outer body material into the dermis, including bacteria, sebum products, and hair [6]. Bacterial colonization in this environment might induce chronic in ammation [7,8]. This illness can be treated in various ways, including pharmaceutical and surgical alternatives [2,9].
Although the molecular link between HS and overall cancer risk is recognized [10][11][12], the detailed molecular mechanism of cancer induction through this skin condition is a fascinating subject in disease and cancer care. Chronic in ammation and infections, which can lead to proliferative epidermal changes, could have begun the tumor [13,14]. The higher recurrence of skin malignancies, which are the site of persistent in ammation and bacterial colonization, supports the concept that the illness and neoplasia are connected. Skin cancer has been interpreted to start in chronic wounds, chronic in ammatory processes, or sclerosing diseases [15,16]. Genetic background, whole transcriptome analysis, miRNA regulatory elements expression, and cytokine expression analysis have all been employed to elucidate the mechanism of HS at the genomic and proteomic levels [17][18][19].
Bioinformatics tools could be utilized by researchers to acquire a better knowledge of molecular causes of illness. Microarray data can be leveraged to de ne a cell's transcript pattern under speci c conditions [20]. To characterize the centrality of proteins in a specialized network in response to varied conditions, numerous approaches have been created. Creating different networks, such as PPI and co-expression networks, to lessen the rami cations of biological processes could shed light on the testimonies of the essential genes, product proteins, related pathways, and illness management [21][22][23]. A considerable amount of transcriptome data is being generated as a result of technological improvements in the life sciences eld, which offers a relevant situation for transcriptome data meta-analysis. This technique will provide a more detailed grasp of biological principles and aid in removing false-positive discoveries. The accuracy and reproducibility of the results improved with meta-analysis [24,25].
In this study, a meta-transcriptome analysis was done on the GSE72702, GSE128637, and GSE148027 datasets, and gene ontology analysis was used to further characterize them. The major clusters, hubgenes, and lncRNAs in response to HS were further identi ed using a PPI network and centrality analysis.
The survival study of the genes in skin cancer and the putative link in skin cancer induction were determined. Finally, potential herbal ingredients with inhibitory effects on the cancer inducer proteins were identi ed.

Transcriptome Data collection
For accurate gene expression pattern analysis, three different microarray transcriptome data sets (GSE72702, GSE128637, GSE148027) were downloaded from the NCBI Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/) and analyzed to determine the differentially expressed genes among the patients. The GSE72702 included 30 samples, 17 of which were for lesions and 13 of which were healthy. There were a total of 21 samples in the GSE128637, with 10 HS samples and 11 healthy samples. The GSE148027 data comprised 25 samples, 18 of which were related to the lesion and 8 of which were assigned to healthy tissue.

Preprocessing and (meta) analysis of transcriptome data
Initially, the Linear Models for Microarray Data "LIMMA" package in the R statistical language was used to scan the DEGs in single data analysis [26]. DEGs in each dataset were de ned as data with |log2FC (fold change) | > 2 and adj. P-value < 0.05. To re ect the common responses across the three datasets, a metaanalysis of transcriptome data was done utilizing the Integrative Meta-Analysis of GEO Data (ImaGEO) web-server following a single data analysis. To integrate the transcriptome data, the maximum P-value method was utilized, which produces more relevant results than effect size methods. A heat map plot utilizing the ImaGEO web-server displayed the combined data, and a Venn diagram described the up-and down-regulated genes across the three investigations [27].

GO and pathway enrichment analysis and lncRNA prediction.
The GO of up-and down-regulated genes was represented using the shinyGO server. In this database, the up-regulated genes are also submitted to Reactome and disease-related pathways prediction. The lncRNA prediction was performed for both up-and down-regulated genes [28]. The macro-modules of PPI and built co-expression network genes ontology were supplied via the Metascape service (https://metascape.org/). Only data having a signi cant adj. P-value < 0.05 was considered.
A context-based PPI network was built utilizing the STRING database better to de ne the effective regulation mechanisms among the up-regulated genes. The minimal score for interaction was selected at 0.7 to get more conclusive data from the structured network [29]. To display the built network, cluster characterization, and hub-gene prediction, Cytoscape version 3.7.1 was utilized [30]. Cluster prediction was made via Clustering with the Overlapping Neighborhood Expansion plug-in in Cytoscape [31]. The clusterONE plug-in is a graph clustering method that employs weighted graphs and creates overlapping groups. These features are bene cial for detecting protein complexes in protein-protein interaction networks with related con dence levels. The Hub-gene prediction was conducted using the Cytohubba plug-in with four distinct algorithms, including betweenness, proximity, Maximum Neighborhood Component (MNC), and degree [32].

Co-expression network of hub-genes and lncRNA
To further describe the role of hub-genes and subsequent pathways, the co-express network was created for hub-genes utilizing the PCViz web server (http://www.pathwaycommons.org/pcviz). The coexpression network for predicted lncRNA was generated by COXPRESdb, v 7.3 [33].

Pathological importance of hub-genes.
Because of the signi cant correlation between the HS patient and various cancers, pathological analysis was performed. The survival analysis of the hub-genes, top ten up-regulated genes, and lncRNA in skin cancer was described using the Kaplan-Meier plotter database [34]. The expression level of these genes in skin cancer was assessed by the GEPIA database [35]. The immunohistochemistry analysis of the genes in skin cancer was performed through the human protein atlas database [36]. The immune in ltration correlation analysis was achieved through the TIMER web-server [37].

Drug target prediction
The TCMID database was utilized to characterize the potential herb-based medicines having inhibitory effects on cancer inducer proteins. Using the target search platform in this database, the names of proteins of interest were inserted, and the potential herbal ingredients with inhibitory effects on the proteins' function were collected.

Transcriptome pattern of HS patients.
Microarray data analysis reveals that 2657 DEGs were discovered in GSE128637, 3457 DEGs were discovered in GSE72702, and 7141 DEGs were discovered in the GSE148027 dataset under the FDR 0.05 and adj. P-value < 0.05 thresholds. Figure 1 shows the top three up-and down-regulated genes in each data set. After removing the repeated DEG symbols from the Venn diagram, 34 genes were identi ed as common DEGs across the three studies. In GSE72702, the most signi cant expression levels were found in S100 Calcium Binding Protein A7A (S100A7A), SERPINB4, and Immunoglobulin Heavy Locus (IGH) genes. In GSE128637, the Immunoglobulin Lambda Like Polypeptide 1 (IGLL1), S100 Calcium Binding Protein A9 (S100A9), and S100 Calcium Binding Protein A8 (S100A8) genes had the highest expression levels. GSE148027 had the highest expression levels of Fc Receptor-Like 5 (FCRL5), Immunoglobulin Lambda Constant 1 (IGLC1), and Marginal Zone B and B1 Cell Speci c Protein (MZB1). An integrative meta-analysis of transcriptome data was performed to further characterize the common DEGs among the three datasets, and the results revealed that SERPINB4, TDO2, ADAM Like Decysin 1 (ADAMDEC1), FCRL5, S100A7A, C-X-C Motif Chemokine Ligand 13 (CXCL13), MMP3, TNF Receptor Superfamily Member 17 (TNFRSF17), S100A9, and C-X-C Motif Chemokine Ligand 6 (CXCL6) genes had the highest expression levels (Fig. 2).

Gene ontology and pathways
The list of up-and down-regulated genes was analyzed using GO in three categories after merging the data: biological process (BP), molecular function (MF), and cellular component (CC). The disease and Reactome terms associated with the up-regulated genes were investigated further. Some of the main BP GO terms for up-regulated genes include chemokine activity, cytokine activity, Serine type endopeptidase activity, metalloendopeptidase activity, response to biotic stimulus, innate immune response, leukocyte activation, response to cytokine, leukocyte chemotaxis, myeloid leukocyte activation, and response to the bacterium. The immunological synapse, cytoplasmic vesicle membrane, and secretory vesicle are basic terms in CC. In MF, important terms include cytokine binding, immune receptor activity, CCR chemokine receptor binding, CXCR, CXCR3 chemokine receptor binding, and endopeptidase activity. Reactome terms include interleukin signaling, cytokine signaling in immune systems, interleukin ten signaling, GPCR signaling, immune system, and interferon signaling. Macroglobulinemia, synovitis, celiac disease, Crohn's disease, and hyperphosphatemia are some of the most essential terms in disease (Fig. 3). The most downregulated genes in BP were in cell-cell signaling, response to endogenous stimulus, chemical homeostasis, organ development, tissue development, epiderm development, cell differentiation, and skin development. Cell body, vesicle, cytoskeleton, and myo brils were some of the CC terms used. Actin, cytoskeletal protein, and steroid-binding were all detected in MF. Ion channel transport, muscle contraction, keratinization, keratinocyte differentiation, epidermal cell differentiation, corni ed envelope formation, and cell junction organization were some of the terms found in the Reactome (Fig. 4).

Cluster, hub-genes, and lncRNA
Context-based PPI was constructed for the up-regulated DEGs as one of the well-known strategies for screening the main clusters and hub-genes. The interactive relationships between the candidate proteins were evaluated using the Cytoscape software. Nine signi cant clusters (adj. P-value < 0.05) were predicted within the constructed network using the clusterONE plug-in in Cytoscape. Figure 5 depicts the exact genes name. Chemokine-mediated signaling pathway, peptide ligand-binding receptor, calcium ion transport, formyl peptide receptors, G alpha signaling events, cellular calcium ion homeostasis, positive regulation of cytosolic calcium ion concentration, osteoclast differentiation, neutrophil degranulation, osteoclast genesis, Mononuclear cell migration, cellular response to interferon-gamma, regulation of cytosolic calcium ion concentration, was among the most signi cant GO terms predicted for the clusters.
Cytohubba was used to analyze nodes (hub-genes) in the constructed network using four different algorithms: betweenness, closeness, Maximum Neighborhood Component (MNC), and degree. IL6, CCR7, CCL5, CXCL10, and FPR2 are the most critical hub-genes in the network (Table 1). MIAT and POLR2J4 lncRNAs were predicted as the only lncRNAs for up-and down-regulated genes, respectively (Fig. 6).
Using the PCViz web server, a co-expression network for the ve top hub-genes was created (Fig. 7). Figure 8A shows the GO analysis of the co-expressed genes with hub-genes. In ammatory response, peptide ligand-binding receptor, interleukin 4, 10, 13, and 17 signaling, response to external stimulus, regulation of leukocyte migration, positive regulation of the MAPK cascade, and positive regulation of cell adhesion are some of the predicted terms. The MIAT lncRNA role in a network for T cell activation, PID IL12 2pathway, regulation of cytokine production, in ammatory bowel disease, cell adhesion molecules, macrophage activation, natural killer cell-mediated cytotoxicity, regulation of DNA binding, regulation of proteolysis, hepatitis C, antigen processing is revealed by the MIAT co-expression network and subsequent GO analysis (Fig. 8). 3.5. Analysis of hub genes, signi cant DEGs, and lncRNA and from a pathological standpoint.
The survival study of hub-genes was carried out, and as shown in Fig. 10, the increased expression level of hub-genes is directly associated with better survival of skin cancer patients. Among the hub-genes, the expression level of CCL5 and CXCL10 in skin cancer was signi cantly increased. TDO2, ADAMDEC1, FCRL5, CXCL13, and TNFRSF17 genes, on the other hand, exhibited an up-regulation in skin cancer, whereas the expression level of the MMP3 gene decreased signi cantly. In skin cancer, the expression level of MIAT was lower than in normal cells. On the other hand, all the genes showed a better survival rate of patients if they are up-regulated except TDO2, SERPINB4, and MMP3 (Fig. 10). These genes demonstrated a better survival rate of patients if they are down-regulated. Interestingly, in the present study, these genes were up-regulated in the patient's transcriptome data with HS disease. By comparing the expression levels in HS, skin cancer, and survival analysis, three genes, TDO2, SERPINB4, and MMP3, were identi ed as having potential relevance for further investigation ( Table 2). The immunohistochemistry of these three genes in skin cancer is shown in Fig. 11, and the results reveal the up-regulation of the genes in skin cancer. On the other hand, TDO2, SERPINB4, and MMP3 genes were used to identify the correlation between their expression level and the immune in ltration of the skin tumor microenvironment. Dendritic cells, T cell CD4+, T cell CD8+, monocytes, and neutrophils immune cells in TIMER web-server were used, and the positively correlated outputs were considered. These results indicated that the TDO2, SERPINB4, and MMP3 genes might regulate the skin tumor immune microenvironment by affecting the in ltration of immune cells (Fig. 12).

Discussion
Transcriptome meta-analysis, identi cation of hub-genes, lncRNA, and the potential mechanism of skin cancer induction are among the Insilico techniques employed in this work. The decrease in false-positive results and the bolding of common responses across studies are two of the most critical advantages of integrating transcriptome data. Linked to the concept of HS illness, the immune system reaction might be due to a later bacterial or viral infection. But the kind of immune response and cancer induction via this disease would enhance the understanding of disease therapy and management and also cancer induction prevention.
In summary, single microarray data functional analysis indicated up-regulation of genes involved in B-cell development/differentiation, in ammatory process control, and immune response. These terms clearly illustrate how an immune response is generated, but they are connected to the illness's second stage, which is a skin infection caused by external stimuli, owing to the HS's disease type. With a signi cant focus on the up-regulated DEGs as a consequence of combined data, it was demonstrated that genes are engaged in both illness preventive and carcinogenic metabolic pathways. As one of the critical squamous cell carcinoma tumor markers, SERPINB4 is proven to be associated with diverse kinds of skin disorders [38]. TDO2, which is involved in tryptophan amino acid metabolism, has been proven to be a critical component of many malignancies. The exact regulatory mechanism of TDO2 is not precise [39,40]. The ADAMDEC1 gene's physiological role is unclear, although it has been proven to be up-regulated in a number of in ammation-related disorders [41,42]. MMP3 and S100A9 are important genes in the development of cancer and immune responses [43][44][45][46]. The FCRL5 gene impacts B-cell development and differentiation, as well as a number of immune-related disorders [47,48]. As a key gene in the innate immune system, S100A7A plays a function in epidermal differentiation and in ammation [49]. It is widely recognized that this gene has a role in human skin defense against infections. CXCL13 has an in uence on in ammatory, infectious, and immunological responses [50,51]. This gene may also play a role in cancer's start and progression. The protein TNFRSF17 increases B-cell survival and regulates humoral immunity [52]. Both Gram-positive and Gram-negative bacteria have been proven to be sensitive to CXCL6 [53,54]. Another noteworthy outcome of this study was illness prediction, which was connected to the upregulated DEGs. Macroglobulinemia is a malignant plasma cell disease in which B cells generate excessive quantities of IgM M-proteins [55]. Lupus is an autoimmune illness in which the body's own tissues, including the skin, are attacked by the immune system [56]. Synovial in ammation occurs when the synovium of a joint becomes in amed [57]. The immune system assaults the body after consuming gluten, resulting in celiac disease [57]. Crohn's disease is one of the two primary types of in ammatory bowel disease (IBD) [58]. Hyperphosphatemia is an electrolyte condition de ned by an abnormally high phosphate concentration in the blood [59]. The majority of people have no symptoms, but some develop calcium deposits in soft tissue. Muscle spasms are frequent as a result of low calcium levels. According to the GO predicted for down-regulated genes, the keratinocyte differentiation metabolism, which is primarily in uenced by calcium, and corni cation, which is the gradual process of dead cell development on the skin to form a physical barrier, were both dysregulated. In the PPI network that was constructed, the chemokine-mediated signaling pathway obtained the highest score. Chemokine receptor-ligand combinations have a critical function in carcinogenesis, in addition to infection and in ammation [60].
The genes IL-6, FPR2, CCR7, CCL5, and CXCL10 were predicted to behave as hub-genes in response to HS. IL-6 is a pleiotropic cytokine with a broad spectrum of actions in the integrated immune response. One of IL-6's functions is immunological competence, which is de ned as a host's capacity to respond to infections [61]. FPR2 identi es formyl peptides generated by pathogens or host cells and contributes to host defense and cell clearance. FPR2 may also identify other ligands with a broad chemical variety that are produced at different phases of in ammation to either promote or resolve in ammation in order to maintain a balanced in ammatory response. The mechanism underpinning FPR2 activation and promiscuous ligand recognition is unknown [62, 63]. CCL5 is categorized as a chemokine or chemotactic cytokine. T cells, eosinophils, and basophils are all chemotactic for CCL5, and it plays a critical role in attracting leukocytes to in ammatory areas [64, 65]. CCR7, a chemokine receptor, is a crucial organizer of the immune system's primary response. CCR7, a CC chemokine receptor, has been identi ed as a critical regulator of B and T cell tra cking to secondary lymphoid organs in homeostasis [66]. In ammatory diseases, including viral infections, immunological dysfunction, and tumor development, have been associated with variations in CXCL10 expression levels. CXCL10 is also utilized as a biomarker to predict the severity of a number of illnesses. Infectious illnesses, chronic in ammation, immunological dysfunction, tumor development, metastasis, and dissemination are all connected to CXCL10 in humans [67,68]. The predicted up-and-down-regulated DEGs include MIAT and POLR2J4. The MIAT lncRNA is proved to be involved in numerous illnesses and is categorized as a disease-related lncRNA [69]. On the other hand, POLR2J4 as a pseudogene is an RNA polymerase II subunit J4.
The functional enrichment of co-expressed genes with hub-genes and up-regulated lncRNA indicates that both created networks have a tight link to the immune response. GO terms such as cytokine-cytokine receptor interaction, interleukin signaling pathways, T cell activation, and reaction to bacteria were among the most relevant terms. Alongside de ning the signi cant pathways in conjugation with both hub-genes and lncRNA, this built network also provided insight into the characterizing proteins with unknown functions.
In the nal part of the study, a combination of gene expression, survival, immunohistochemistry, and immune in ltration studies linked to skin cancer was done. The data indicate that among the hub-genes, CCL5 and CXCL10 genes exhibited a substantial up-regulation in skin cancer. On the other hand, among the up-regulated DEGs of merged data, TDO2, ADAMDEC1, and CXCL13 exhibited up-regulation in skin cancer. But, for correct knowledge of the in uence of gene up-or down-regulation on cancer development, survival studies were done for each gene to examine the consistency or inconsistencies linked to the expression level of genes in HS, skin cancer, and survival patterns. Putting together, low expression of TDO2, SERPINB4, and MMP3 proved to enhance the survival rate of patients with skin cancer, and it was in contradiction with examined transcriptome data which indicates the up-regulation of these genes in HS illness. Immunohistochemistry con rmed TDO2, SERPINB4, and MMP3 gene over-expression in skin cancer from various individuals. It has been shown that the immunological state of the tumor microenvironment and the molecular expression of the immune checkpoint can predict the immunotherapy reactivity of patients. It was found that there is a moderate to a signi cant association between TDO2, SERPINB4, and MMP3 and immune cells. In short, TDO2 exhibited a connection with dendritic cells, T cell CD4+, monocytes, and neutrophils. SERPINB4 indicated a link with T cell CD4+, and MMP3 demonstrated a correlation with monocytes. In malignancies, immune cell in ltration is highly complex, which demands additional in-depth investigation.
Using herbal substances for skincare has been proved to be one of the most reliable strategies due to its favorable molecular effects. To this goal, in the present work, prospective pharmacological targets for SERPIN B4, MMP3, and TDO2 proteins were described using the TCMD database. Through a literature analysis of the anticipated compounds, it was proven that all the compounds have substantial antioxidant activity and, among them, Magnolol, Salicylic acid, and Piceatannol compounds have been demonstrated to be anti-cancer agents [70][71][72].

Conclusion
In conclusion, the present work is intended to undertake a thorough bioinformatics analysis of data relevant to HS illness. IL-6, FPR2, CCR7, CCL5, and CXCL10 were predicted to be hub-genes in response to HS illness. Predicted MIAT lncRNA, together with the stated hub-genes, are expected to be engaged in the immunological response of the cell against external stimuli. The Top-ten up-regulated DEGs, hub-genes, and predicted lncRNA was subject to clinical signi cance study, and TDO2, SERPINB4, and MMP3 genes exhibited clinical relevance owing to their up-regulation in HS disease. These genes are also favorably associated with dendritic cells, T cell CD4+, monocytes, and neutrophils. On the other hand, Piceatannol, Salicylic acid, and Magnolol herbal ingredients were demonstrated to be inhibitors of SERPIN B4, MMP3, and TDO2 proteins, respectively. The output of the present work will aid in better de ning the HS illness and subsequent cancer induction process.

Declarations
Con ict of Interest None Figure 1 Volcano plots showing the DEGs in three different studies, A) GSE72702 B) GSE128637 C) GSE148027.
Red and blue colors demonstrate up and down-regulated genes, respectively. The x-axis demonstrates the fold change, and the y axis demonstrates the -log10 P-values. The expression level of three top up-and down-regulated genes in each study. D) GSE72702, E) GSE128637, F) GSE148027, the x-axis demonstrates the fold change. The adjusted P-value was considered < 0.05.    Macro-modules prediction of constructed PPI network. Using the clusterONE plugin, nine signi cant modules were predicted in the constructed network. GO Enrichment analysis of predicted macro-modules using (https://metascape.org/). The x-axis demonstrates the -log10 (P-value). Bar graph of enriched terms among the hub and co-expressed genes colored based on the P-values.

Figure 7
Top ten predicted hub-genes with four different algorithms using cytohubba Cytoscape plugin, A) MNC B) Degree C) Betweenness D) Closeness, E) constructed co-expression network (http://www.pathwaycommons.org/pcviz) for the most signi cant hub genes include, IL6, CXCL10, FPR2, CCR7, CCL5. Genes with the highest number of connections in the network are detected as hub genes.
The image shows the degree of importance of hubs through a color scale ranging from red (the most critical) to yellow (less critical).