Data collection and acquisition.
Gene expression data of IgAN patients were obtained from the NCBI Gene Expression Omnibus public database (GEO, www.ncbi.nlm.nih.gov/geo/). The GSE93798 dataset, containing 22 healthy control and 20 IgAN kidney tissue samples were selected. Probes were converted to gene symbols according to the GPL22945 platform (Affymetrix Human Genome U133 Plus 2.0 Array). GSE37460 annotated by GPL14663 (Affymetrix GeneChip Human Genome HG-U133A Custom CDF) and GPL11670 (Affymetrix Human Genome U133 Plus 2.0 Array) included 27 IgAN samples and 27 health control sample. Datasets GSE141295 and GSE104948 were used for verification sets. All data were freely available.
Data Processing and Differential Expression Analysis
Two datasets of GSE93798 and GSE37460 microarray normalized expression matrix and annotate the probes using the dataset’s annotation file. Using the ComBat algorithm of R software “SVA” package for batch correction of datasets GSE93798 and GSE37460. After ID conversion, Differential expression analysis of IgAN and health control samples was performed using the “limma” package in R software. Genes with p < 0.01 and |log2FC| > 2.0 were considered as differential expression genes (DEGs). Heat maps and volcano plots of DEGs were created using the “pheatmap” and “ggplot2” packages.
Weighted Correlation Network Analysis (WGCNA)
WGCNA was carried out according to the method in the literature[18]. WGCNA was applied to analyze differential genes in GSE93798 and GSE37460. We used gene expression profiles, then calculated the MAD (Median Absolute Deviation) of each gene separately, eliminated the top 50% of genes with the smallest MAD. The correlation coefficient between each gene pair was calculated to construct a similarity matrix. To ensure the construction of a scale-free network, a suitable soft threshold was chosen to transform the similarity matrix into an adjacency matrix. Subsequently, the adjacency matrix was transformed into a topological overlap matrix (TOM). Based on the relevant parameters of the blockwiseModules function, genes with similar expression profiles were grouped into different modules using the dynamic tree cutting method. The module eigengene (ME) and the module membership (MM) were used to distinguish the vital modules. The module with the highest absolute value of the correlation coefficient was identified as the key module for further analysis. Module membership (MM) refers to the correlation coefficient between genes and module eigengenes. Gene significance (GS) was the correlation coefficient between the expression value of a gene and a phenotype, representing the association between genes and phenotypes[19].
Fatty acid Metabolism-Related Genes (FAM)
We extracted two fatty acid metabolism-related gene sets from the Molecular Signatures Database v7.5.1 (MSigDB; www.gsea-msigdb.org), including Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (Pathway: hsa00071) and Reactome database (R-HSA-8978868), and they are presented in Supplementary 1.
Identification of Hub Genes
We identified overlapping 12 hub genes associated with both fatty acid metabolism and IgAN (FAM-IgAN), the intersection of DEGs, genes obtained by WGCNA, and genes in the fatty acid metabolism gene sets were taken using the “VennDiagram” package in R software.
Functional enrichment, pathway analysis of hub genes
Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using the “clusterProfiler” R package. Adjusted p < 0.05 was selected as the cut-off criterion. The respective functions of each hub genes were revealed by Gene Set Enrichment Analysis (GSEA). IgAN patients were differentiated into two groups (the low expression group and the high expression group) based on median values of hub gene expression levels. We computed the consistency p-value for each gene set, and p-values less than 0.05 were considered significantly enriched. All these analyses were conducted using the “clusterProfiler” package in R software.
Construction of the Fatty acid Metabolism-Related risk signature
The least absolute shrinkage and selection operator (LASSO) logistic regression was used to classify the diagnostic markers of IgAN using the “glmnet” package, with a regression penalty score of 0.03679984165. Then the risk score of the predictive model was calculated. The diagnostic accuracy was measured using receiver operating characteristic curves (ROCs) and the area under the ROC curve (R package: survivalROC).
Evaluation of subtype distribution among immune‑infiltrated cells.
CIBERSORT was shown to transform a normalized gene expression matrix into the composition of 22 immune cell types based on a deconvolution algorithm. Here, CIBERSORT was employed to calculate the composition of immune cells in IgAN and healthy samples.
Correlation and differential analysis of immune‑infiltrated cells.
The correlations between hub genes expression and immune cell infiltration were determined using Pearson correlation coefficients (r) > 0.4 and P < 0.05. Pearson correlation coefficient was obtained from the sample data screened by CIBERSORT, P < 0.05.
HMC culture and treatment
HMCs were obtained from Pricella (CP-H067, Shanghai, China) and cultured in RPMI-1640 supplemented with 10% FBS, 100 U/mL penicillin, and 100mg/ml streptomycin. HMCs treated with extracted Gd-IgA1 to established IgAN model in vitro reported previously[15]. When HMCs were cultured om serum-free medium and treated with 1mg/ml Gd-IgA1 for 24 h after cell confluence reached 70-80%.
Human subjects
A total of 25 patients who had biopsy-proven IgAN and enrolled in the Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China, between July 2021 and July 2023. Health controls (n=25) were selected from age-match normal individuals from the medical examination center.
RNA extraction and real-time PCR
The total RNA from peripheral blood and HMCs were extracted by Trizol reagent (Tiangen Biotech, Beijing, China). We used a cDNA synthesis kit (HiScript II Q RT SuperMix for qPCR, Vazyme, China) to compound cDNA according to the standard instructions. The cDNA was analyzed in an ABI ViiA 7 real-time polymerase chain reaction system to detect mRNA expression for RT-PCR. All the primers were synthesized by Sangon Biotech (Sangon, Shanghai, China). The relative expression level of mRNA was calculated by 2-∆∆Ct method and three replicate experiments were involved. The internal control GAPDH was used for correction. All primers were used as summarized in Supplementary 2.
Statistical Analysis
All statistical analyses were performed in R software. The t-test and Mann–Whitney U-test were selected according to whether the data conformed to a normal distribution. Univariate and multivariate logistic regression analysis were utilized to assess the diagnostic value of the predictive mode. Significance was usually defined as p < 0.05.