3.1 Summary of clinical characteristics
A total of 47 people were enrolled in this study, and RA patients consisted of 40 women, and normal volunteers (N) consisted of 7 women. The results showed that ESR, CRP, RF and anti-CCP were significantly different between the two groups (P<0.05) (Table 1). The DAS28 score, VAS score, number of tender joints, number of swollen joints and HAQ score of the control group were all 0 in healthy volunteers. Inflammatory indicators such as ESR and CRP in the RA patients were significantly higher than those in normal people (P<0.001). Immunological indices such as RF and anti-CCP antibodies were significantly higher in the RA patients than in the healthy subjects (P<0.001).
3.2 Species Diversity of the Gut microbiome
To evaluate gut microbial community richness, we combined rank abundance analysis with beta diversity analysis. The flatness of the broken line reflects the evenness of the community composition. The flatter the broken line is, the smaller the difference in community abundance and the higher the evenness of community composition (Figure 1A). In measuring beta diversity, UniFrac distances were used to estimate differences between samples or taxa. The PCoA analysis of the two-dimensional ranking chart showed the difference between the two groups, with Axis 1 as the main factor and Axis 2 as the secondary factor. Combined with the results before and after weighting, RA had a significant impact on the abundance and composition of the intestinal flora (Figure 1B).
3.3 Alterations in Microbial Composition Associated with RA
We analyzed which species mainly affected the differential distribution of the above differences (Figure 2). We drew a Venn diagram, and the results showed that the number of Nonsingleton amplicon sequence variants (ASVs) jointly owned by the two groups was 2193 (Figure 2A). Group RA and N had a certain degree of difference and similarity. More unique ASVs were detected in Group RA than in Group N, indicating that RA had a great influence on the number of gut flora of patients.
To further explore the taxonomic distribution of species in each region of the Venn diagram, we counted the ASV abundance at the phylum and genus levels. At the phylum level, the total abundance values of the two groups were similar. In particular, compared with Group N, the abundance of Proteobacteria and Fusobacteria in Group RA was relatively increased, while the abundance of Firmicutes and Bacteroidetes was relatively decreased. At the genus level, the total value of ASV abundance in Group RA was higher than that in Group N. In Group RA, the abundance of Bacteroides, Megamonas and Oscillospira increased, while the abundance of Prevotella, Faecalibacterium and Roseburia decreased (Figure 2B).
Bidirectional clustering was also used to further analyze the species abundance between groups to study more different species between Group RA and N (Figure 2C). By selecting different species, a heatmap of different species between the two groups was drawn. Lactococcus, Butyricicoccus, Epulopiscium, Megasphaera and Desulfovibrio were highly abundant in women with RA. In addition, we also studied the phylogenetic relationships of species at the genus level. Multiple sequence alignments were used to obtain phylogenetic trees constructed by the first 100 genera (Figure 2D), with branches representing their corresponding phyla. Stacked histograms outside the ring represent the abundance distribution information of the genus in different groups.
We used linear discriminant analysis (LDA) effect size analysis (LEfSe) to estimate the magnitude of the influence of each species abundance on the differential effect and then identified the species with significant differences between the two groups (Figure 3A). 49 taxa (LDA>2, P<0.05) were identified as significant discriminants. Roseburia, Sutterella, Gemmiger, Parabacteroides, Alistipes, Bifidobacterium and Barnesiella were the main taxa enriched in healthy people. RA patients had significantly fewer of these bacteria in their gut. Figure 3B shows the taxonomic hierarchy of the sample community from phylum to genus.
3.4 Random forest analysis and nested stratified cross-check analysis
Random forest analysis is a supervised machine learning algorithm based on the idea of integrated learning. It integrates more models, avoids limitations, and combines them to predict the final outcome. It is often used to screen biomarkers for intestinal flora[18-21]. We used it to provide further support for distinguishing RA from N groups. We used undrawn ASV tables and taxon abundance tables to conduct nest analysis cross-check analysis and 5-fold cross-check to score according to the importance of output species. Figure 3C shows that species descended in order of importance, so it can be considered that the most important species were markers of differences between groups. The genera with importance scores greater than 0.05 were Gemmiger, Bilophila, Odoribacter and Phascolarctobacterium.
The color shading in each row group identifies the predicted proportion of markup samples. The values for each row in the top half of the table were the proportion of the sample in the corresponding group that was correctly or incorrectly divided into each group. We evaluated the models according to model accuracy and baseline accuracy. Baseline accuracy was correlated with sample size in each group. Figure 3D shows that the comprehensive evaluation ability of this model was ideal.
3.5 Differentially Abundant Metabolites Between the RA and N Groups
Principal component analysis (PCA) can select a few important variables through linear transformation of multiple variables. To obtain high-quality metabolomics data, PCA diagram (Figure 4A) was drawn in this study. The dense distribution of samples indicated that the data measured in this study were reliable. Orthogonal projections to latent structures discriminant analysis (OPLS-DA) can effectively reduce the complexity of the model without reducing the predictive power of the model while enhancing the explanatory power of the model (Figure 4B). Most of the 40 samples in Group RA were concentrated in the left quadrant, and all 7 samples in Group N were concentrated in the right quadrant, indicating good separation between the two groups. There were significant differences in metabolites between RA and N group, indicating that the fitting accuracy of this model was high. It is necessary to conduct a displacement test on the model to observe whether the model exhibits overfitting. As shown in Figure 4C, there was no overfitting of the model.
3.6 Screening and identification of differential metabolites in patients with RA
As a nontargeted metabolomics project based on LC-MC, we identified and qualitatively analyzed metabolites based on retention time, primary mass spectrometry, secondary mass spectrometry and other information.
Variable importance in the projection (VIP) of OPLS-DA and P values of univariate statistical analysis (T test) were used to screen differential metabolites. VIP>1 and P<0.05 were used as reference criteria for screening differential metabolites. In the positive ion mode, phytolaccasaponin G, Lys-Phe-Lys, Pyroglu-Arg-Arg and sucralose were upregulated. In the negative ion mode, cinchonine, heptadecasphing-4-enine, N-acetyl-l-glutamate, deoxyinosine and 5,9-tetracosadienoic acid were upregulated (Figure 5A).
Receiver operating characteristic (ROC) curves were plotted for further analysis. The results ultimately showed that Lys-Phe-Lys and heptadecasphing-4-enine can be used as potential biomarkers for RA. The AUC of heptadecasphing-4-enine was 0.811 and that of Lys-Phe-Lys was 0.796. The P value of them was less than 0.05, indicating that the two biological indicators were strongly significant (Figure 5C, D).
3.7 Differential analysis of metabolic pathways
For KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway enrichment analysis, Fisher's exact test was used to analyze and calculate the significance level of metabolite enrichment in each pathway. Fatty acid biosynthesis was found to be significantly different between RA patients and healthy women (Figure 5B).
3.8 Correlation analysis of intestinal differential metabolites, laboratory and intestinal differential flora
We performed Spearman correlation analysis of gut bacteria and metabolites at a subordinate level in women. From Figure S1 A, Gemmiger had an extremely strong positive correlation with Pyroglu-Arg-Arg, betonicine, benzoylmesaconine and protoporphyrinix (P<0.001). Gemmiger can produce short-chain fatty acids such as butyric acid, and we consider that these four metabolites may exert stronger anti-inflammatory effects with the help of Gemmiger. There was a strong positive correlation between L-pipecolic acid, 2 s-amino-4e-pentadecene-1, phytolaccasaponin G, neoline/bullatine b, Lys-Phe-Lys, poncirin and Parabacteroides. Afridi R et al. found that poncirin had good analgesic potential under the condition of inflammatory pain[22]. Parabacteroides, as beneficial intestinal bacteria, could have synergistic anti-inflammatory effects with poncirin[23-25].
As shown in Figure S1 B, probucol had a negative correlation with Faecalibacterium, Oscillospira, Ruminococcus, and Parabacteroides. Faecalibacterium can produce butyrate, and studies have confirmed that the abundance of Faecalibacterium decreases in RA patients[26,27], which can further affect our immune system through the gut. Since Oscillospira can produce short-chain fatty acids, we speculate that probucol may interfere with the function of Oscillospira in protecting the intestine and indirectly affect the occurrence of RA[28-30]. Many studies have shown that the abundance of Ruminococcus significantly increased in the intestinal tract of patients with systemic lupus erythematosus, Crohn's disease and other immune diseases[31,32]. Studies have shown that Ruminococcus is positively correlated with RF-IgA and anti-CCP antibodies[33]. Ruminococcus abundance was significantly increased in female RA patients, while the specific relationship between probucol and Ruminococcus was not clear. Roseburia has a positive effect in maintaining homeostasis in the gut and may help 5,9-tetracosadienoic acid play an anti-inflammatory role.
Figure S1 C shows that RF had an extremely strong positive correlation with Megamonas, Dialister and SMB53 (P<0.001). ESR had a relatively strong positive correlation with Streptococcus, Lactobacillus and SMB53 (P<0.01). There was an extremely strong negative correlation between Sutterella and DAS28.