Potential Pathogenic Genes and Mechanism of Ankylosing Spondylitis: A Study Based on WGCNA and Bioinformatics Analysis

Bo Wu Jilin University First Hospital Jing Yu Jilin University First Hospital Yibing Liu Jilin University Gaojing Dou Jilin University First Hospital Zhiyun Zhang Jilin University Xuepeng Pan Jilin University First Hospital Hongyu Wang Jilin University Pengcheng Zhou Jilin University First Hospital Dong Zhu (  zhu_dong@mail.jlu.edu.cn ) Jilin University First Hospital https://orcid.org/0000-0001-9422-2103


Introduction
Ankylosing spondylitis (AS) is an insidious, progressive, and debilitating form of autoimmune arthritis with its primary target being the axial skeleton [1]. It can cause characteristic in ammatory, back pain, and reduce spinal mobility [2], leading to structural and functional impairments and a decrease in quality of life [3]. It is reported that in the United States, such a disease is more likely to happen in men before the age of forty, with a prevalence of about 0.5% [4]. Besides, early symptoms of AS are often not apparent, and most of the patients can't be diagnosed at the early stage. That usually leads to a delay in AS treatment and loss of the best time for treatment.
The current treatment for AS is often through drugs. Several medicines are applied to stop or delay patients' spinal problems and relieve pain. NSAIDs, which usually is the rst drug given to patients, can stop the body from making in ammation-causing chemicals called prostaglandins. Corticosteroids, TNF inhibitors, and Interleukin inhibitors (IL-17 inhibitors) are also available to relieve the suffering of patients. Currently, the prognosis of AS has an individual difference. There is no radical cure at present. We can only control the disease progression through targeted treatment.
AS is characterized by a complex pathological process. The basic pathology is primary, chronic, pannus destructive in ammation, and Ligament ossi cation is a secondary repair process. The lesions typically begin in the sacroiliac joint and slowly travel up the spine, involving the synovium and capsule of the intervertebral facet joints and the soft tissue around the spine. At the late stage, it can make the soft tissue around the entire spine calcify, ossify, and lead to severe hunchback. The disease can also spread downwards at the same time, affecting both hips and, in a few cases, knees. Most patients have a gene that can make a protein called HLA-B27, which is widely believed to have a strong connection with AS. Currently, only one genetic biomarker for diagnosing ankylosing spondylitis -HLA -B27 was reported. The rest biomarkers are still not reported [5], and the uses of the resulting biomarkers for AS are widespread.
Also, some researchers have suggested that eukaryotic translation elongation factor 1 epsilon 1(EEF1E1) may be a potential genetic biomarker for diagnosing AS. In patients with AS, the expression levels of these genes were signi cantly up-regulated, and the study showed that EEF1E1 participates in AS by in uencing the aminoacyl-tRNA biosynthesis [5]. Another research suggested that another two genes named MAPK7 and NDUFS4 played important roles in the pathogenesis of AS via the cAMP-mediated signaling pathway. Both of them could be targeted by Indomethacin [6]. Nevertheless, the study of AS is relatively super cial, and its mechanism is still unclear. So, we attempt to nd new biomarkers and increase the mechanistic understanding by using an analysis of gene expression microarray pro ling, providing early prevention and management of AS [7].
Weighted gene co-expression network analysis (WGCNA) is a new system biology method and is increasingly used in bioinformatics for analyzing gene expression chip mapping data [8][9][10]. By constructing a gene co-expression network, highly correlated genes were clustered into several modules. When a module is associated with external information, biologically interesting modules are detected. Key genes that play a key role in disease development have been identi ed in interesting modules related to important biological functions or pathways. Also, WGCNA can be used to screen candidate biomarkers or therapeutic targets. Therefore, we conducted this study to look for new biomarkers, related genes, or potential mechanisms related to AS.

Data Collection and Preprocessing
The gene expression pro le (GSE25101) of AS was downloaded from the Gene Expression Omnibus (GEO)database. GSE25101 contains 16 normal samples and 16 AS samples [45].

Co-expression Network Construction
The "WGCNA" package [10] in R software (version 4.0.0) was used for the network construction. Samples conforming to Z.K <2.5 were considered peripheral and deleted from the expression and trait data. Pearson correlation coe cients were calculated for all gene comparisons, and the soft thresholding power was evaluated. Then, the weighted network adjacency matrix was calculated based on a ij = |cor (x i , x j )| c. X i and X j are nodes I and J of the network, which is determined by scale-free topology criteria [11]. The adjacency matrix was transformed into a topological overlap matrix to identify highly interconnected gene modules and clusters. Topological overlap measure (TOM) was used to determine the interconnectivity of the network. Gene module detection USES hierarchical clustering method was based on different measures (1-TOM) [12]. The dynamic branch cutting method was used to identify the branches of the hierarchical cluster tree [13].
Identi cation and functional annotation of clinically important modules Modules closely related to biological or clinical information were selected from hierarchical clustering as interesting modules for subsequent analysis. In the process, gene signi cance (GS), module signi cance (MS), and module eigengene (ME) was calculated. GS was de ned as the negative logarithm of the pvalue, and MS was the average genetic meaning of the entire module gene. ME was the rst principal component of a given module. Signi cance between ME and clinical traits was also calculated through a correlation analysis because modules with high salience were path-related and could be candidates [14][15]. Gene ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomics (KEGG) enrichment by Annotation, Visualization, and Integrated discovery database [16] were performed to further validate the selected modules by looking for potential mechanisms and biological pathways.

Identi cation and validation of hub genes
The hub genes of interesting modules were determined by the absolute value of the members of the gene module >0.9 and the signi cance of the gene characteristics >0.22 [17]. Then, a PPI (protein-protein interaction) network of DEGs was constructed through the Search Tool for the Retrieval of Interacting Genes (STRING) database. A medium con dence (0.4) was applied to lter the interactions. Detection was performed based on Cytoscape (version 3.4.0) software to reveal modules of the PPI network [18][19]. In addition, the miRNA targeting those hub genes were predicted by FunRich software (version 3.1.3).
The miRNA-mRNA interaction networks were also plotted using Cytoscape.

Results
Calculate soft threshold, construct co-expression matrix and partition module According to the distribution of the scale-free network, similarity and anisotropy coe cients among genes were calculated, and the cluster tree of the system between genes was constructed. Using the function of the WGCNA software package in R software, the optimal soft threshold = 8 is calculated to divide the co-expression module. After the soft threshold was determined, the dynamic shear tree method was used to preliminarily identify and merge similar modules.
Cluster analysis was carried out on the modules, and the modules close to each other were merged into new modules. The minimum number of genes in each gene co-expression network module was set as 30, and 14 modules were obtained, among which the gray module was the gene set that could not be aggregated to other modules ( Figure 1).

Screening of high-risk pathogenic gene modules for AS
The WGCNA software package was used to calculate the correlation between these modules and AS according to the feature vectors of each module, to build the cluster tree of each module and disease phenotype, and to calculate the Pearson correlation coe cient between different modules and AS. The results showed that 582 genes contained in the yellow (Classical Module) (r=0.43, P=1.4e-27) and 59 genes contained in grey60 (Hematological Module) (r=0.2, P=0.13) modules had the strongest correlation with AS. See Figure 2 for the results.  Table 2.

Biological function notes for AS hub genes
The genes of the Classical Module and Hematological Module were uploaded to DAVID's website for GO analysis and KEGG analysis to explore the biological functions of differentially expressed genes. The GO analysis results showed that the process of SRP-dependent cotranslational protein targeting to membrane, ribosome, and NADH dehydrogenase (ubiquinone) activity in the Classical Module was signi cantly abnormal, and the process of platelet activation, integrin complex, and extracellular matrix binding in Hematological Module was abnormal. And the KEGG analysis results showed that genes in the Classical Module are associated with Parkinson's disease, Huntington's disease, Alzheimer's disease; while genes in the Hematological Module have an association with Platelet activation, Tight junction, and ECM-receptor interaction. As shown in Figure 3, 4 and Table 3, 4.

Discussion
Ankylosing spondylitis is a chronic nonspeci c in ammation caused by a variety of factors. Due to its high rate of disability, it seriously affects patients' quality of life [20]. The pathogenesis of AS is complex, involving genetic factors, environmental factors, immune response, and other aspects. No speci c molecular mechanism has been reported so far. The therapeutic effect is not satisfactory and the economic burden on patients is increased [21]. In this study, the WGCNA algorithm was used to explore the molecular mechanism of AS and provide a basis for new therapeutic strategies In this study, after constructing a co-expression matrix and Partition Module, and screening of high-risk pathogenic gene modules for AS, 582 genes in the Classical Module and 59 genes in the Hematological Module were identi ed related to AS. GO and KEGG analyses were performed to gain further insights into the molecular mechanism of the AS.
The GO and KEGG analysis results showed that the process of SRP-dependent cotranslational protein targeting to membrane, ribosome, and NADH dehydrogenase (ubiquinone) activity was signi cantly related to the genes of the Classical Module, while the genes in the Hematological Module enriched in the process of platelet activation, integrin complex, and leukocyte migration. The SRP pathway is closely related to the level of membrane protein and secreted protein in eukaryotic cells [22], which may promote AS by regulating the secretion of in ammatory mediators and the occurrence of in ammation. According to a recent report, ubiquinone is not only a mobile component of the mitochondrial electron transport chain but also a membrane antioxidant [23]. Besides, it was found that the reaction products of the antioxidant activity of ubiquinol stimulated the formation of inorganic and organic oxygen radicals [24], which can lead to in ammation. The ribosome was reported that anti-ribosomal P protein IgG can be a diagnostic indicator for systemic lupus erythematosus and rheumatoid arthritis (autoimmune diseases) [25]. That means anti-ribosomal P protein IgG may also be a potential diagnostic indicator for AS. At present, platelet activation was pointed as a sign of worsened AS [26], which supports the results of this study. Besides, some drugs have been shown to relieve the symptoms of ankylosing spondylitis by inhibiting platelet activation [27]. That suggests a novel treatment strategy for AS. For the integrin complex, as the key protein of the integrin signaling pathway, the ILK protein was identi ed with higher expression levels in AS patients than in healthy people [28], which may be a potential marker of AS.
To obtain the hub genes among the 582 genes contained in the Classical Module and 59 genes contained in the Hematological Module, they were analyzed by using the PPI network according to the STRING database. Finally, we got the top20 hub genes in each module, including DPY30, MRPS33, RPL23 in the Classical Module, and TSPAN9, TRELM1, CDKN1A in the Hematological Module. In addition, the miRNA-mRNA interaction network was plotted and 38 miRNAs were predicted targeting genes in Classical Module, and 64 miRNAs were identi ed targeting Hematological Module.
DPY30, the core subunit of the SET1/MLL family histone H3K4 methyltransferase complex, can affect glucose homeostasis [29]. DPY30 was reported to control proliferation by regulating the inhibitor of differentiation (ID) protein expression, leading to reactive oxygen species (ROS) levels rise and aging bypasses [30]. Cell aging and increased ROS levels may be one reason for in ammation in AS. MRPS33, one of the genes that encode the small subunit of the mitochondrial ribosome, is reported to show hypermethylation in older muscle tissue [31], which leads to the loss of muscle mass and strength in the elderly.
That may support the myophagism in AS patients. Besides, the adult ies with the MRPS33 gene knocked out are more likely to develop cardiomyopathy [32], and the symptoms of cardiovascular disease also appeared as one of the accompanying symptoms of AS patients [33]. RPL23, a negative regulator of cellular apoptosis, is proved that it can encode autoreactive proteins, reacting with T cells and autoantibodies [34]. This reaction may be one of the components of the in ammatory response in patients with AS. Besides, the development of AS occurs due to excessive proliferation of broblasts [35], while RPL23 negatively regulates cellular apoptosis via the RPL23/Miz-1/c-Myc circuit [36], which may expose the mechanism of the development of AS.
TSPAN9, a gene associated with platelet activation [37], can be used as a target for platelet activation in patients with AS, and platelet activation plays a role in heart disease secondary to AS [39]. TSPAN9 has also been linked to decreased muscle tone and joint relaxation [40], which also associates with AS. TRELM1, a gene associated with platelet dysfunction, can also lead to heart disease secondary to AS [39]. Moreover, TRELM1 binds to brinogen and plays a role in bleeding caused by in ammatory damage. CDKN1A, cyclin-dependent kinase inhibitors involved in cell cycle regulation, can regulate the proliferation of skeletal muscle cells. Besides, the CDKN1A gene is an in ammatory response gene in the central nervous system, which may be related to the nervous system secondary to AS.
There are several highlights in the study. WGCNA, an e cient systems biology algorithm, was used for the rst time to analyze the drivers of AS. Secondly, WGCNA has a unique advantage in dealing with gene expression datasets. The results of this study not only con rmed the results of previous studies but also provided new biomarkers for future studies of AS. However, this study also has some limitations and needs independent cohort and/or in-vitro experiments for validation. Whether these biomarkers can be applied in clinical practice remains to be further veri ed.

Conclusion
In this study, WGCNA, an e cient system biology algorithm, was used to analyze the high-risk pathogenic driver genes of ankylosing spondylitis, further elucidating the molecular mechanism of the occurrence and development of AS, and providing candidate biomarkers for the clinical diagnosis and treatment of AS, providing a new idea for the further study of ankylosing spondylitis. Acknowledgements:Not applicable 40. Fanizza I, Bertuzzo S, Beri S, Scalera E, Massagli A, Sali ME, Giorda R, Bonaglia MC. Genotype-phenotype relationship in a child with 2.