Establishment of A Clinical Diagnostic Model for Gouty Arthritis Based on Serum Biochemical Profile: A Prospective Study

: Objective Gouty arthritis is a chronic disease caused by purine metabolism disorder. The progress of the disease is relatively clear, divided into four stages. Clinical judgment of the stage of gouty arthritis is generally based on blood uric acid levels, bursa fluid examination , there are false positive cases. This paper attempts to construct a clinical diagnostic model based on routine blood test data, so as to avoid tedious procedures such as cystic fluid examination and predict the development direction of the disease. Methods Serum samples from 579 patients were collected and divided into training set (379 cases) and validation set (200 cases). Through a series of multivariate statistical analysis and correlation analysis, 12 serum biochemical indicators were selected as risk factors for gouty arthritis to obtain the serum biochemical spectrum of gouty arthritis. Based on the biochemical indicators of 379 patients, the clinical diagnosis model of gouty arthritis was established and visualized with R language to predict the probability of patients at different stages, and the stage with the highest probability was used as the diagnosis result, and the evaluation model of 200 patients was used. Results The total ROC of the clinical diagnosis model was 0.95, and the Kappa coefficient of the clinical diagnosis model was 0.80. Conclusion This model can be applied to clinical prediction and basic research to improve the accuracy of gouty arthritis staging.


Introduction
Gouty arthritis (GA) is a kind of chronic disease caused by purine metabolic disorders, its progression is relatively clear, can be roughly divided into four stages, Hyperuricemia (HUA), acute gouty arthritis (AGA), during the intermittment period (DIP) and hronic gouty arthritis (CGA).When uric acid concentration in blood was above 7 mg/L or 420 mumol/L (Hyperuricemia, HUA), monosodium urate (MSU) crystals might accumulate in joint capsule, bursa, cartilage, bone, or other periarticular tissues, which stimulated the synovial membrane of the joint, and produced pathological reactions such as synovial vasodilation, as well as increased permeability and leukocyte exudation (Acute gouty arthritis, AGA). However, some patients with HUA would not be transformed into AGA, only showing excessive serum uric acid [1], statistics show that up to 30% of AGA patients were provided with normal uric acid value [2,3]. Therefore, high uric acid concentration in blood was a necessary and insufficient condition for AGA. Clinically, AGA was most frequently encountered in the major joints, especially in the first metatarsophalangeal joint, ankle, and foot joints. Other complications include chronic renal injury, ureteral calculi, and arthritis malformation [4,5]. Long-term intermittent repeated episodes of AGA would lead to the deposition of uratoma and eventually evolve into chronic gouty arthritis (CGA), and there would be a period of asymptomatic interlude(During the intermittment period, DIP) between the evolution of AGA to CGA. The abridged general view of GA was provided in Figure 1.

Figure 1. Abridged general view of the evolution of gouty arthritis
According to the statistics, the incidence rate of GA were increasing year by year worldwid (about 1-2%, 2018), especially in developing countries [6,7].With the deepening of research in recent years, GA has been gradually defined as an autoimmune inflammatory disease [8]. Some preliminary research conclusions on the pathogenesis of AGA and CGA have been made by researchists at present. The pathogenesis of AGA might be related to the activation of Toll-like receptors, NLRP3 inflammers or P2X7 receptors [9], while CGA might be related to the stimulation of the generation of extracellular neutrophils traps [10] or endoplasmic reticulum stress response [11]. Some scholars also analyzed the metabolites of gout, HUA patients and healthy volunteers by NMR [12] and IC-MS [13] , and thus obtained the relevant biomarkers of gout and HUA.
The clinical diagnosis of GA was generally based on joint swelling [14], CT [15], smear test [16] and the description of patients, which was provided with uncertainties (MSU crystal smear of hyperuricemia was also presented positive results) [17]. At present, there are different treatment methods for GA patients at different stages, such as diet control, drug treatment and surgical treatment. However, in the absence of clear diagnostic markers, there is still a lack of methods to prevent the occurrence of GA and predict the evolution trend of GA. Therefore, white blood cell (WBC), C reactive protein (CRP), uric acid (UA), blood urea nitrogen (BUN), creatinine (Cre), hemoglobin (Hem), erythrocyte sedimentation rate (ESR), high/low-density lipoprotein (HDL/LDL), total cholesterol (TC), triglyceride (TG), demographic (age, body mass index(BMI), sex) and living habit(smoking and alcohol drinking habit) had been considered as the candidate for the risk factors of the progression of GA in this reaserch. For most of the patients, the symptoms of GA and Rheumatoid Arthritis(RA) are similar, so it is easy to ignore the condition and delay the optimal treatment time. Therefore, we also distinguished the two by the difference of biochemical indicators.Furthermore, principal component analysis (PCA), orthogonal partial least squares discrimination analysis (OPLS-DA), non-repetitive one-way ANOVA [18], correlation analysis [19] and multiple logistic regression analysis [20] were used to screen the important indicators affecting GA in each stage, and to distinguish GA and RA.Finally, multiple logistic regression was used to establish a clinical diagnosis model [21], so as to improve the success rate of clinical diagnosis and prediction.The overview of study design was indicated in Figure S1.

Statistical analysis of patient information and serum biochemical indicators
Demographic data, living habits, comorbidities, disease durations and medical situations were collected through questionnaires and case history. UA , WBC, CRP,BUN, creatinine, hemoglobin, ESR, HDL/LDL, TC and TG of participators in each group were determined by the automatic biochemistry analyzer(ACA), the outcome of statistical analysis was provided in Table 1(training  set) , and the trend of biochemical indicators in the five stages was indicated by Box-plot( Figure  2). The RA group (n=32) was added in this study in order to find a method that could be used to distinguish CGA from RA.

Principal component analysis and orthogonal partial least squares discrimination analysis
Multivariate analysis was performed on SIMCA-P 16.1 with the raw data provided in Table S1. In order to figure out the intrinsic differences among the 6 groups (control, HUA, AGA, DIP, CGA and RA), principal component analysis (PCA) and Orthogonal Projections to Latent Structures-Discriminant Analysis (OPLS-DA) were utilized to distinguish. The PCA score plots ( Figure 3A) showed a satisfactory separating effect of data among the 6 groups, while the R 2 X=1.00 and Q 2 =0.978. Moreover, OPLS-DA score plots ( Figure 3B) showed a better consequence, while the R 2 X=0.707, R 2 Y=0.483 and Q 2 =0.453. As can be seen from Figure 3, the biochemical index profile of each group was significantly different from that of the control group, with the greatest difference between the CGA group and the control group. Furthermore, the serum biochemical profiles of CGA and RA were relatively different, indicating that the biochemical indicators selected in this study could adequately distinguish CGA from RA.

One-way ANOVA
The non-repetitive one-way ANOVA results of serum samples from different groups of patients were shown in Table S3, and the variance summary of biochemical indicators in each group was provided in Table 2.

Correlation analysis of factors influencing the course of GA
The corrplot package written by our research group in R language was applied for correlation analysis of biochemical indicators associating with progression of GA. The matrix heat map of correlation were shown in Figure 4, and detailed analysis data were provided in Table S4. Here, we assume that the GA process is gradually evolved from the control group, HUA group, AGA group, DIP group and CGA group from light to heavy.As can be seen from Figure 4, LDL and TC were significantly correlated , indicating that the correlation was not affected by the disease and had little significance for the evolution of GA. In addition, UA, creatinine, WBC, CRP, BUN, ESR, BMI and HDL were significantly correlated with the progression of GA, among which UA, creatinine, WBC, CRP, BUN, and ESR were remarkably positively correlated with the progression of GA, while HDL and BMI were remarkably negatively correlated with it.

Clinical diagnostic model associating with progression of GA based on serum biochemical indicators
MASS, PROC, ggplot2, mlogit, RMS and survival packages in R language were used to carry out ordinal multiple logistic regression for clinical diagnostic model associating with progression of GA. The independent variables were without influence upon regression coefficient and segmentation point of the dependent variables, and there was no multicollinearity between the independent variables. The dependent variable was transformed into the corresponding dummy variable before modeling, and then the continuous variables (Among the independent variables, "Sex" was the classified variable, and the rest were continuous variables) were standardized, followed by factor analysis. The training set data (n=379) was adopted to build the model, the five stages of GA(Ccontrol、HUA、AGA、DIP、CGA) were regarded as dependent variable , then logistic stepwise regression was performed for each stage. Meanwhile, combining the results of "2.1"-"2.4", risk factors were screened to form the best logistic regression equation for each stage. The prediction probability of the sample under each regression equation was calculated, and the stage of GA in which the maximum probability was located was considered as the diagnosis. The verification set data (n=200) was adopted to evaluate and verify the model. Details of clinical diagnostic model of GA were provided in Table 3.

Evaluation and verification of clinical diagnostic model associating with progression of GA
Then, the validation set data was substituted into the above models, and the predicted results were outputted and compared with the actual result (Table S2). The receiver operating characteristic (ROC) curves of five models were shown in Figure 5. Total area under the curve (AUC) of clinical diagnostic model associating with progression of GA was 0.9534, and Kappa coefficient was applied to evaluate the consistency between the predicted results of the model and the actual results. Encouragingly, Kappa coefficient of the clinical diagnostic model was 0.80, which indicated a substantial consistency of this model (The magnitude of Kappa coefficient could be divided into five degrees, 0.0~0.20: slight consistency, 0.21~0.40:fair consistency, 0.41~0.60: moderate consistency, 0.61~0.80: substantial consistency, 0.81~1: almost perfect consistency).It shows that the model has high accuracy and reliability.

Visualizing of clinical diagnostic model associating with progression of GA
In order to get more application of GA diagnostic model in clinical practice,, the nomogram of the clinical diagnostic model associating with progression of GA was made by adopting the r language rms package ( Figure 6). According to the regression coefficient of each influencing factor in the multiple regression model, each influencing factor is scored, and the summation function of the score is converted into the probability of the occurrence of the outcome event, and the stage of the maximum probability is the diagnosis result. This method can transform the complex regression equation into a simple and visual chart, making the results of the prediction model more readable and useful.

Discussion
Statistical studies showed that the prevalence of GA increased year by year, especially in developing countries, with a 2% increase in the prevalence of GA from 2017 to 2019 [22][23]. The incidence of GA in European and American countries was 0.13% ~ 0.37%, and the annual incidence is 0.20% ~ 0.35% [24]; the prevalence of HUA in the general population in China was about 10%, [25]. So far, more than 80% of the published articles on disease prediction models for arthritis were associated with RA [26][27][28][29] according to our literature review and document retrieval. Thus it could be seen that GA was not a research hotspot in the field of arthritis, which was not commensurate with its high incidence. In order to reverse this abnormal phenomenon, the study of clinical diagnostic model and IRF associating with progression of GA based on Serum biochemical indicators was implemented in this article.
We found a strong correlation between LDL and TC in all groups(LDL/TC in control, HUA, AGA, DIP and CGA group: 0.54, 0.51, 0.59, 0.58 and 0.55), which was consistent with literature reports [30]. The correlation was not affected by GA progression, which indicated that LDL and TC were not IRF of progression of GA. Moreover, according to line charts in Figure 2, hemoglobin tended to flatten out in the five stages of GA. Therefore, LDL, TC and hemoglobin were excluded in the establishment of diagnostic model. Figure 2 clearly indicated that BUN and creatinine showed an significant elevation in CGA group compared with the other five groups. The kidney was the main organ for excreting BUN, as with serum creatinine, BUN could be in the normal range in the early stages of renal function impairment. However, BUN and creatinine would rise rapidly when the glomerular filtration rate(GFR) drops below 50% of normal. Studies have shown that chronically high levels of uric acid in the blood could significantly increase the risk of kidney disease [31], and 5.6% of 13338 participants(mean serum uric acid = 5.9 ± 1.5 mg/dL) had incident kidney disease defined by GFR decrease of more than 30% over 8.5 years [32]. Another population-based cohort study showed that patients with CGA who were treated with urate-lowering therapy (ULT) had a greater risk of incident chronic kidney disease (CKD) [33]. Therefore, BUN and creatinine could be regarded as IRFs for the progression of GA, and the significant rising of BUN and creatinine indicates that the course of GA has entered the CGA stage. However, there was no significant difference in BUN and creatinine between the control, HUA, AGA and DIP group, so the establishment of a clinical diagnostic model for the progression of GA requires the contribution of other indicators. Besides, BUN and creatinine could be considered as a the diagnostic indicators of CGA and was a serum biochemical indicator that could distinguish CGA from RA in addition to rheumatoid factors(RF).

Boxplots C and I of
Moreover, as can be seen in Figure 3, the biochemical profile of each group was significantly different from that of the control group, with the greatest difference between the CGA group and the control group, as well as creatinine, age, ESR, BMI, hemoglobin, WBC, CRP and UA contributed significantly to the difference between groups (VIP ＞1). Furthermore, the serum biochemical profiles of CGA and RA were relatively different, indicating that the biochemical indicators selected in this study could adequately distinguish 6 groups including 5 GA stages and RA. On this account, we suppose that the bias of the overall profile of the patient's serum biochemical indicators to a certain stage may lead to the appearance of symptoms. Therefore, we believe that the previous single indicator diagnosis method should be improved, and turn to adopt multiple indicators for clinical comprehensive diagnosis and prediction. Meanwhile, according to the results of correlation analysis, UA, creatinine, WBC, CRP, BUN, TG, ESR and HDL were significantly correlated with the progression of GA, among which creatinine (0.62), ESR (0.36), BUN(0.32), WBC (0.31) and UA(0.23) had the strongest correlation with it, this was basically consistent with the results of multivariate statistics.
The training set data (n=379) was adopted to build the clinical diagnostic model associating with progression of GA (consists of five phases), and the verification set data (n=200) was adopted to evaluate and verify it. AUCs of five models were 0.9814 (Control), 0.9288(HUA), 0.9752(AGA), 0.9056(DIP), 0.9759(CGA). The Kappa coefficient applied to evaluate the consistency was 0.80, indicated a substantial consistency of this model. Furthermore, The visualization of the model is realized by using nomograms, which facilitates more clinical application of the model. Both doctors and patients can judge the development and changes of GA according to the model, which provides an effective reference for distinguishing the development stages of gouty arthritis.
After this study, non-targeted metabolomics analysis will be performed on the serum samples of GA patients collected above to obtain potential diagnostic biomarkers related to the progression of GA, including absolutely qualitative and quantitative verification. Then, the changes of biomarkers and serum biochemical indexes in different progressions of GA were found to optimize the clinical diagnostic model of GA based on serum biochemical indexes proposed in this paper. It is expected to develop a new clinical diagnosis method of GA similar to the "GA diagnostic kit", which can help clinicians predict the progress of GA faster and more accurately.

Study design and measurement
This prospective study was implemented during 30 November 2017 and 10 November 2019 at Zhejiang Provincial People's Hospital(ZPPH), the largest comprehensive first-class hospital in Zhejiang province of China. All participator were categorized as healthy control, HUA, AGA, DIP and CGA based on the level of serum uric acid and the definition of GA as follow(based on 2015 ACR-EULAR Gout Classification Criteria) [34]: (1) healthy control was defined as serum uric acid of 150～416 μmol/L in males or 80～357 μmol/L in female and never had GA or other types of arthritis, such as rheumatic arthritis, rheumatoid arthritis, infectional arthritis, osteoarthritis and psoriatic arthritis. (2) HUA was defined as serum uric acid of ＞416 μmol/L in males or＞357 μmol/L in female. (3) AGA was defined as persistent swelling and intense pain in the peripheral joints or bursae, meanwhile, the symptoms peaked within 24 hours and resolved within 14 days. (4) DIP was defined as having been over 4 weeks since the last AGA attack and had not received uric acid lowering therapy (ULT). (5) CGA was defined as joint swelling, pressing pain, deformity, dysfunction and hypodermic tophus. (6) all the patients' synovial fluid in HUA and AGA groups were identified by polarized-light microscopy to ensure the presence of MSU crystals.
In addition to hyperuricemia, the onset of GA was usually associated with age, BMI, sex, smoking and alcohol drinking habit [35]. Therefore, demographic data, living habits, comorbidities (tumor and cardiovascular disease), disease durations and medical situations were collected through questionnaires and case history, which was provided in Table 1. Serum uric acid levels and other blood biochemical indexes of participators in each group were determined by the CHEMIX-180 automatic biochemistry analyzer (Sysmex Corp., Kobe, Japan ). Each subject was visited and collected serum only once during the study period.

Study population and Statistical analysis
A total of 579 serum specimens were collected from ZPPH, including 379 in training set: 80 healthy volunteers ( About 70% of the patients were inpatients and another 30% were outpatients, the acute episodes of AGA lasted no less than 3 days and the course of CGA was more than 11 years. Among the 207 patients with GA (mean value of age = 49.23±17.90), 188 were male (90.8%) and 19 (9.2%) were female. GA patients had a higher average BMI (25.88±3.01 vs. 22.9±2.7 kg/m 2 ; p = 0.01) and uric acid (420.81±116.86 vs. 158.2±89.0; p = 0.001) than those in the control group. Meanwhile, the percentage of GA patients with smoking and alcohol drinking habits was much higher than the healthy volunteers (smoker: 34.4% vs. 55.8%; drinker: 25.0% vs. 46.3%). Statistical description of other biochemical indicators in all serum specimens was shown in table 1, and the trend of biochemical indicators in the five disease progressions was indicated by Box-plot ( Figure 2). In addition, the raw data of biochemical indicators in all serum specimens was provided in Table S1 (training set) and S2(validation set). The difference in measurement with a two-sided alpha level of 0.05 could be ensured by this sample size using Fisher's exact test.Simca.16.1(Umetrics inc., Sweden) was used for principal component analysis and partial least square analysis in 6 groups (control group, HUA group, AGA group, DIP group, CGA group).Using SPSS23.0 (SPSS Inc, Chicago, Illinois, USA) for different group of patients serum samples of repetitive single factor analysis of variance. Afterwards Pearson correlation coefficients (PCCs) was conducted on the indexes that may affect the progression of GA by using R language packages(corrplot ). .By applying the code written by our research team and combining 7 R language packages (quality plan,pROC package, ggplot2 package, mlogit package, RMS package,corrplot package and survival package), we carried out multiple logistic regression analysis, established GA diagnosis model, and conducted verification and visualization.

Conclusion
In this study, serum biochemical indicators from clinical assessment were adopted to establish clinical diagnostic model and serum biochemical profile associating with progression of GA. It is the first study to investigate into the serum biochemical profile at different stages of gouty arthritis. An obvious difference in serum biochemical profile among stages of GA was found in this paper, which could effectively distinguish them. We suppose that the bias of the overall profile of the patient's serum biochemical indicators to a certain stage may lead to the appearance of symptoms. A simple evaluation tool was developed, we have reason to believe it will be useful for clinicians and researchers wishing consolidate their clinical diagnosis in regard to the stage and tendency of GA patients. This clinical diagnostic model could be applied clinically and in research to improve accuracy of identification and prediction of these stages of GA patients.

Availability of data and materials
The following data are available in "Supplementary Materials", Table S1: Raw data of biochemical indicators in serum specimens of training set(n=379), Table S2: Raw data of biochemical indicators in serum specimens of validation set(n=200), Table S3: The result of one-way ANOVA, Table S4: Detailed correlation analysis data, Figure S1: The overview of study design, Figure S2: Matrix heat maps of correlation in 4 forms generated by corrplot package written by our research group in R language.
The data that support the findings of this study are available from JUTCM but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of JUTCM.

Ethics approval and consent to participate
The medical ethics committee of ZPPH approved this study (acceptance number: KY2018048, approval number: 2018KY046), the ethics committee was constituted and operated in accordance with the principles of SFDA/GCP and the Declaration of Helsinki. Health screening of participants in the study were enrolled at ZPPH from 2018 to 2019. Demographic data, living habits, comorbidities, disease durations and medical situations were recorded in a normative questionnaire.

Consent for publication
The publication of all case reports has been consent, and the personal data contained in the manuscript have been consent to publish by the patient.