Bioinformatics Analysis of Gene Expression Proles of Immune-mediated Necrotizing Myopathy

Background: Immune-mediated necrotizing myopathy (IMNM) is a type of autoimmune myopathy with limited therapeutic measures. This study aims to elucidate the potential biomarkers and investigate the underlying mechanisms in IMNM. Materials and Methods: Microarray datasets in GSE128470 and GSE39454 were obtained from the Gene Expression Omnibus. Differentially expressed genes (DEGs) were ltrated by limma package in R statistical software. Functional enrichment analyses were performed using DAVID online tools. STRING database was used to construct protein ‐ protein interaction (PPI) networks. The module analysis and hub genes validation were performed using Cytoscape software. Results: Integrated analysis of two databases revealed 160 co-expressed DEGs in IMNM, including 80 downregulated genes and 80 upregulated genes. GO enrichment analysis revealed that sarcomere is the most signicantly enriched GO term within the DEGs. KEGG pathway enrichment analysis revealed signicant enrichment pathways in cancer. A PPI network consisting of 115 nodes and 205 edges were constructed and top 20 hub genes were identied. Two key modules from the network were identied. Eight hub genes in module 1 (MYH3, MYH7B, MYH8, MYL5, MYBPH, ACTC1, YNNT 2 and MYOG) are tightly associated with skeletal muscle construction. Seven hub genes in module 2 (C1QA, TYROBP, MS4A6A, RNASE6, FCGR2A, FCER1G and LAPTM5) mainly take part in immune response. Conclusions: Our research indicated that cancer-related pathways, skeletal muscle construction pathways and immune-mediated pathways might participate in the development of IMNM. Identied hub genes may serve as potential biomarkers or targets for early diagnosis.


Background
Immune-mediated necrotizing myopathy (IMNM), also described as necrotizing autoimmune myopathy (NAM), is a rare type of idiopathic in ammatory myopathies characterized by symmetric weakness of proximal limb muscle, high level of serum creatine kinase, infrequent extra-muscular involvement, along with distinctive histopathological characteristics including marked myo ber necrosis and regeneration with a lack of lymphocytic in ltrate [1]. The prevalence of IMNM is estimated to be around 10 cases per 100,000 people [2,3]. IMNM is divided into three subtypes by serology markers: anti-signal recognition particle (anti-SRP) IMNM, anti-3-hydroxy-3-methylglutaryl-CoA reductase (anti-HMGCR) IMNM, and autoantibody-negative IMNM [1]. To date, the causes and pathological mechanisms underlying IMNM are still poorly understood. Several different factors are thought to contribute to IMNM, indicating the heterogeneity of the etiology of this entity. Substantial evidence exists to support the association between IMNM and various risk factors, including statin exposure, viral infections such as hepatitis C and HIV, malignancy and connective tissue diseases. Furthermore, a genetic component with strong immunogenetic predisposing factors has been postulated to predispose individuals to IMNM in the human leukocyte antigen (HLA)-DRB1 genotyping studies [4]. No random clinical trials exist to guide treatment strategies in IMNM with the majority of the recommendations based on case series, longitudinal cohort studies, and experts' recommendation. It is agreed that treatment of IMNM ought to be initiated early and individualized to prevent long-term disability or recurrence [1]. However, treatment options of IMNM are still scarce and controversial. In the era of molecular-targeting drugs, it seems of utmost importance to explore its molecular mechanisms for the improvement of the diagnostic and therapeutic effects of IMNM.
Bioinformatics methods have been extensively used in medical oncology to evaluate the molecular mechanism in different types of human carcinogenesis. This study aims to identify the key genes and pathways that may be involved in the underlying pathomechanism of IMNM, narrow the target range and provide target therapy directions for further research using bioinformatics methods. Two original microarray datasets, GSE128470 and GSE39454 were acquired from the Gene Expression Omnibus (GEO) databases. Screening of differentially expressed genes (DEGs) were conducted followed by functional enrichment analysis. Protein-protein interaction (PPI) networks were established to select hub genes. Our results demonstrated that identi ed hub genes may serve as potential prognostic biomarkers in IMNM and pathways in cancer, skeletal muscle construction and immune response facilitate the understanding of the pathogenesis in IMNM.

Microarray Data
The raw gene expression pro les of GSE128470 and GSE39454 were acquired from the Gene Expression Omnibus database (https://www.ncbi.nlm.nih.gov/geo/) by searching the following key terms: ("necrotizing myopathy" OR "immune-mediated necrotizing myopathy" OR "necrotizing autoimmune myopathy"). GSE128470 was sequenced on the platform GPL96 (HG-U133A; Affymetrix Human Genome U133A Array) and GSE39454 was sequenced on the platform of the GPL570 (HG-U133_Plus_2; Affymetrix Human Genome U133 Plus 2.0 Array). Table 1 illustrated the information of the datasets. Data processing and differential expression analysis The data processing was performed utilizing the R statistical software (version 4.0.1; https://www.rproject.org/) and Bioconductor analysis tools (http://www.bioconductor.org/). The original les were transformed to probelevel data and converted into an international standard name for genes (genesymbol). Probe sets without corresponding gene symbols were removed. If several probe symbols mapped to one particular gene, the mean of the probes was calculated as the nal gene expression value. Screening of DEGs between necrotizing myopathy (NM) samples and normal (NL) samples was conducted using the limma package in R after background correction and quantile normalization. Only genes differentially expressed with a Pvalue < 0.05 were selected and saved in CSV les for further analyses. Subsequently, DEGs were identi ed with the cutoff criteria of an absolute log fold-change (FC) value (|log2FC|) > 1 in GSE128470 and |log2FC|>0.653 in GSE39454 so as to get su cient co-expressed DEGs in both datasets for further enrichment analysis.

PPI network construction and hub genes identi cation
The Search Tool for the Retrieval of Interacting Genes [6] (STRING, version 11.0; https://string-db.org/) is a pre-computed global resource for demonstrating the interaction information derived from predictions and experiments. In the current study, we constructed PPI networks of the overlapped DEGs utilizing the STRING database with a con dence score of ≥ 0.4 for signi cant differences. Cytoscape software (version 3.6.1, http://www.cytoscape.org/), a platform for visualizing complex networks and integrating these with any type of attribute data, was implemented to establish a PPI network, in which nodes represent proteins and edges represent the relationship of proteins. The connectivity degree was statistically analyzed by their combined scores based on the database algorithms. The highly connected nodes in the network were identi ed and labeled as hub genes utilizing the cytoHubba tool kits of the Cytoscape. The MCODE plug-in was applied to quantify the degree of each protein node and explore signi cant modules. MCODE scores > 5, degree cut-off = 2, node score cut-off = 0.2, Max depth = 100 and k-score = 2 were de ned as the threshold values.

Heterogeneity Test and Preliminary Screening of DEGs
All data were illustrated by box plots. As shown in Fig. 1, the median of NM and NL samples was almost on the same line, indicating superior effectiveness of standardization. Volcano Plots were applied to visualize the DEGs, in which red dots represent the upregulated genes and blue dots represent the downregulated genes. A total of 1070 DEGs (526 upregulated and 543 downregulated genes) in GSE128470 and 704 DEGs (453 upregulated and 251 downregulated genes) in GSE39454 were identi ed according to the cut-off criteria described above (Fig. 2). Hierarchical clustering analysis was carried out separately for all samples in the two cohorts according to the overlapped DEGs, revealing that the patients were clearly separated into two clusters based on the expression levels of the identi ed DEGs (Fig. 3). The consistently up-and downregulated genes in both cohorts were selected and presented using Venn diagram. 160 signi cant co-expressed DEGs were nally con rmed, including 80 downregulated DEGs and 80 upregulated DEGs (Fig. 4). The top 100 co-expressed DEGs were displayed by heatmap visualization in Fig. 5, which also revealed that the DEGs expression patterns varied between the NM and NL groups.

Functional enrichment analysis
In order to better look into the biological functions of the DEGs in IMNM, the overall 160 DEGs were predicted by DAVID for further analysis. A total of 94 GO terms were associated with the DEGs with 21 signi cantly enriched GO terms that met the cut-off criteria of P-value < 0.05. The top ten signi cantly enriched terms in biological process, cellular component and molecular function were shown in Fig. 6. In biological process term, we found that the DEGs were mainly abundant in regulation of transcription (DNA-templated), ossi cation, muscle lament sliding, insulin-like growth factor receptor signaling pathway and skeletal muscle contraction (Fig. 6A). In molecular function term, the DEGs were mostly enriched in protein binding, identical protein binding, calcium ion binding, protein homodimerization activity and calmodulin binding (Fig. 6B). In cellular component term, the DEGs were predominantly involved in extracellular exosome, cytosol, membrane raft, lysosome, sarcomere and myosin lament ( Fig. 6C). KEGG pathway analysis presented that adrenergic signaling in cardiomyocytes, pathways in cancer, small cell lung cancer, ECM-receptor interaction, tryptophan metabolism, focal adhesion, oxytocin signaling pathway and cGMP-PKG signaling pathway were most highly related (Fig. 6D).

PPI network and module analysis
A PPI network consisting of 115 nodes and 205 edges was constructed, setting P < 0.05, coe cients of |r| > 0.4 as a strict threshold. Figure 7 illustrated the co-expression network performed by Cytoscape, displaying the interactions of the overlapping DEGs. Then, by using the MCODE plug-in, we identi ed two key modules from the PPI network (Fig. 8). The module 1, which exhibited the highest score, consisted of 8 nodes (i.e. MYH3, MYH7E, MYH8, MYL5, MYBPH, ACTC1, YNNT 2 and MYOG) and 25 edges. The module 2 containing 7 nodes (i.e. C1QA, TYROBP, MS4A6A, RNASE6, FCGR2A, FCER1G and LAPTM5) and 20 edges, also possessed a strong connection. These ndings implied that the afore-mentioned genes with higher hub degrees tend to exert a potential in uence in IMNM.

Identi cation of hub genes
Hub genes from the whole network were extracted using the plug in cytoHubba application. Top 20 hub genes were selected based on multiform algorithm (i.e. MCC, MNC, DMNC and Degree) as shown in  Fig. 9. Discussion IMNM is a subtype of idiopathic in ammatory myopathies associated with different environmental and genetic and cancer risks factors. Despite opportune immunosuppressant treatment, a majority of IMNM patients still suffer from persistent and signi cant weakness with a relatively poorer prognosis compared with other types of myositis [7][8][9]. Therefore, comprehending the genetic mechanisms and discovering novel molecular targets are crucial for improving the therapeutic effects. This study analyzed gene expression pro les, identi ed IMNM-related DEGs and enabled identi cation of potential candidate genes and pathways, aiming at investigating the pathogenesis of IMNM at the genome level and nding target therapy directions.
After screening differentially expressed genes, 160 signi cant co-expressed DEGs in two databases were identi ed, including 80 downregulated DEGs and 80 upregulated DEGs. GO enrichment analysis revealed that sarcomere is the most signi cantly enriched GO term within the DEGs. Muscle contraction is controlled by a family of sarcomeric myosin motors, which consist of myosin heavy chains and myosin light chains. Proximal muscle weakness is the typical clinical characteristics of IMNM patients, suggesting the destruction of the muscle's essential contractile machinery. Therefore, it is hypothesized that the immune-induced destruction of chemo-mechanical properties of the myosin motors may be potentially critical in the pathological mechanism of IMNM. However, whether or not the muscle weakness in IMNM patients is caused by impairment of sarcomere dynamics remains to be fully clari ed.
KEGG pathway enrichment analysis focused on eight signi cant pathways, of which small cell lung cancer, ECM-receptor interaction and focal adhesion fall within cancer-related pathways. ECM-receptor interaction pathway is crucially important in the process of tumor shedding, adhesion, degradation, movement and hyperplasia. The function of ECM has been veri ed in various cancers such as prostate [10], gastric [11] and breast cancer [12], etc. Focal adhesion signaling hubs critically regulated cell behavior, impacted on tumor cell survival and served as potential cancer targets [13]. Notably, cancer can be diagnosed prior to, concurrent with, or after the diagnosis of IMNM, suggesting that the two courses might be closely related [14,15]. A pronounced higher risk of cancer has been demonstrated in HMGCRpositive IMNM [16] in line with previous data showing a trend towards increased incidence of cancer in IMNM [17]. Consequently, it's tempting to speculate that the anti-tumor immune response may act as a key process in the pathogenesis of IMNM, as recently outlined in other autoimmune settings [14]. Several lines of evidence support the hypothesis that a cancer-associated autoimmune mechanism might participate in the process of idiopathic in ammatory myopathies [15]. One possible mechanism is that mutant proteins in tumors may become the target of an antitumor response which could be redirected to the wild-type proteins, ultimately resulting in damage to normal healthy tissues [15]. Another possible mechanism is that the overexpression of an autoantigen by tumor cells leads to abnormal processing or cleavage of the protein, generating cryptic epitopes not previously encountered by the immune system, initiating autoimmunity and damaging muscle and other target tissues [15]. Our results provided additional evidence for the potential association between IMNM and cancer.
We afterwards constructed a PPI network consisting of 115 nodes and 205 edges. Since modules with more complex interactions tend to exhibit more critical roles in the underlying mechanisms, two signi cant modules in PPI network were identi ed and taken into further analysis. Notably, all 15 genes identi ed in module analysis were simultaneously hub genes, indicating that these genes might be crucial in the development of IMNM. Eight hub genes in module 1 (MYH3, MYH7B, MYH8, MYL5, MYBPH, ACTC1, YNNT 2 and MYOG) are tightly associated with skeletal muscle construction. Seven hub genes in module 2 (C1QA, TYROBP, MS4A6A, RNASE6, FCGR2A, FCER1G and LAPTM5) mainly take part in immune response. Interaction network for the top 20 hub genes supports the distribution of these two pathways as well.
Eight hub genes identi ed in module 1 encode proteins that are either a muscle-speci c transcription factor that can induce myogenesis, namely MYOG, or components of the skeletal muscle sarcomeric lament, including MYH3, MYH8, MYH7B, TNNT2, MYL5, MYBPH and ACTC1. Mechanistically, myogenin binds to the myomaker promoter, regulates myocyte fusion, and is required for expression of myomaker [18]. Given that the prominent ber necrosis and regeneration are the pathological characteristics of IMNM, we proposed MYOG as a regulator of the enhanced regenerative responses in muscle homeostasis. The molecular changes underlying muscle weakness remain unclear in IMNM. Lassche et.al conducted an experiment on contractile function in IBM and found that the reduced formation of attached actin-myosin cross-bridges may lead to muscle ber weakness [19]. A study on nemaline myopathy with de nite mutations also revealed impaired actin-myosin interaction causing decreased muscle ber force [20]. Therefore, our ndings suggest that the dysregulation of these genes associated with skeletal muscle construction are likely to be pathogenic in IMNM. The contractile function in IMNM have not been studied yet. Further researches are desperately needed for veri cation.
Evidence exists to support the strong immunogenetic predisposing factors contributing to IMNM [4,21,22]. Seven hub genes identi ed in module 2 are strongly linked to immune response, including C1QA, TYROBP, MS4A6A, RNASE6, FCGR2A, FCER1G and LAPTM5. The functions of these genes in immune system were listed in Table 3. Notably, C1QA is a component of calcium-dependent complex of the 3 subcomponents C1q, which was observed at the sarcolemmal level in IMNM patients, suggesting classical pathway activation (antibody-dependent) of complement [23]. In IMNM, the identi cation of a signi cant amount of CD4 + and CD8 + T cells, grouped in the endomysial and perimysial compartment, and a strong and ubiquitous up-regulation of MHC class I major histocompatibility complex on the sarcolemma and MAC deposition on endomysial capillaries strongly support the involvement of an immune-mediated process [24]. Allenbach et. al showed that sarcolemmal C5b-9 deposits correlated with necrosis and that the classical complement pathway was activated [23]. Evidence demonstrated that IMNM is essentially a Th1-M1-mediated disease in which high levels of IFN-γ, TNF-α, IL-12, and STAT1 functionally regulate the immune response [24]. However, the underlying immune response in IMNM still needs to be addressed in detail. Even if our ndings are not contributing directly or indirectly to the disease pathology, they nonetheless provide evidence in an accessible manner of a mechanism likely to be occurring in IMNM, which may guide a direction for future work. Discovery of these genes and identi cation of their roles in IMNM may offer useful information regarding the treatment of IMNM. Table 3 Hub genes strongly linked to immune response in module 2.

Gene
Functions in immune response C1QA C1QA, the initiator of the complement classical pathway, restrains the response to selfantigens by modulating the mitochondrial metabolism of CD8 + T cells, which can themselves propagate autoimmunity [25].

FCER1G
FCER1G, the high-a nity IgE receptor, participates in the process of immune cell in ltration [26].

LAPTM5
LAPTM5, encoding a membrane protein that localizes to intracellular vesicles, is essential for the lysosomal degradation of T cell and B cell antigen receptors through the transport endosomes to lysosomes [27].
TYROBP TYROBP associates with activating receptors recognizing major histocompatibility complex (MHC) class III molecules on the plasma membrane of NK cells.

RNASE6
RNASE6 belongs to the ribonuclease A (RNaseA) superfamily and is secreted by innate immune cells with anti-infective and immune-regulatory properties [28].

Declarations
Availability of data and materials The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interest
The authors declare that they have no competing interests.

Funding
Funding information is not applicable.
Authors' contributions JS and SW performed data analyses and wrote the manuscript. JZ, JW, XC and JZ contributed signi cantly to data interpretation and manuscript revision. JS, SW, WZ and JG conceived and designed the study. All authors read and approved the nal version of the manuscript.       The PPI network of the top 20 hub genes. The shape of triangle represents upregulated genes; the shape of "V" represents downregulated genes. Abbreviations: PPI, protein-protein interaction.