Characterization of metabolome-wide biomarkers associated with kidney function in the Chinese population


 Background Chronic kidney disease (CKD) is a global public health problem. Identifying sensitive filtration biomarkers is a key diagnostic value contributing to an understanding of CKD at the molecular level. A metabolomics study indicated a snapshot of the biochemical activity of the human body at a particular time in the progression of CKD. This metabolome-wide biomarker study verified whether plasma or serum metabolite profiles are significantly different in CKD at various stages and characterized potential markers to assess kidney function in the Chinese population.Methods: An analysis of plasma and serum metabolites using ultrahigh-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) was performed on 198 participants (53 serum samples and 145 plasma samples) based on their measured GFR (by iohexol plasma clearance). Participants started recruiting in late 2019, and untargeted metabolomics assays (N ¼ 214) were conducted at the Calibra-Metabolon Joint Laboratory (Hangzhou, China) using Metabolon's HD4 Discovery untargeted metabolomics platform in early 2021.Results A large number of metabolomics related to the mGFR were selected as the top 30 metabolites by the random forest method, and we found 15 amino acids, 8 nucleotides, and 2 carbohydrates strongly related to kidney function in the combined group (serum and plasma). Thirteen amino acids, 9 nucleotides, and 3 carbohydrates were identified in the plasma group, while 13 amino acids, 7 nucleotides, and 3 carbohydrates were found in the serum group. We observed that 10 of the top 15 ranked metabolites were concordant between the plasma and serum groups. Major differences in metabolite profiles with increasing stage of CKD were observed, including altered tryptophan metabolism and pyrimidine metabolism.Conclusions Global metabolite profiling of plasma/serum uncovered potential biomarkers of stages of CKD. In addition, these novel biomarkers provide insight into possible pathophysiologic processes that may contribute to the progression of CKD and a higher risk of comorbidities and mortality.


Background
Chronic kidney disease (CKD) is a growing burden on people worldwide and has become a major public health concern affecting approximately 10% of the population and substantially increases the risk of cardiovascular morbidity and mortality [1][2][3]. Biomarkers promote clinicians to offer more appropriate diagnosis and treatments. To assess kidney function, creatinine is a well-established biomarker [4].
However, creatinine is not a sensitive marker and does not warrant early detection [5,6]. It is well known that blood metabolite concentrations are in uenced by kidney function, and a few metabolites are used for its estimation-e.g., the above creatinine is used to estimate the glomerular ltration rate (GFR).
However, creatinine has some limitations, rising only after almost half of kidney function loss and a dependence on race, re ecting underlying differences in muscle mass. Assessment of GFR is vitally important to the estimation of renal function in clinical practice and public health, and measured GFR (mGFR) remains the reference standard ("gold standard") [8,9].
Using a panel of ltration markers can improve precision, reduce errors caused by variation in the non-GFR determinants of each marker and decrease the need to use race and clinical characteristics as surrogates for the non-GFR determinants [10,11]. Our study aims to track a few new metabolites biomarkers to optimize the development of GFR and identify novel glomerular ltration-related markers that are better than or equal to creatinine.
Most metabolites are eliminated by the kidney. Changes in serum or plasma metabolite concentration levels may be caused by impaired renal function and can be used to estimate GFR. In this study, GFR was measured with the plasma clearance rate of iohexol [12][13][14], and at the same time, estimated GFR (eGFR) on the basis of creatinine and cystatin C levels was also assessed [7]. Thus, our study yields a wide-range list of metabolites associated with mGFR and highlights potential novel ltration markers that may aid in improving the estimation of GFR.
Despite the high incidence and increasing prevalence of CKD, the underlying pathophysiologic mechanisms have not yet been elucidated. Therefore, the identi cation of novel biomarkers of kidney function is still clinically useful, as evidenced by the recent addition of a second marker to estimate GFR, cystatin C, which accurately improves estimation of GFR and predicts future risk of end-stage renal disease (ESRD) and death [15].
Blood metabolite levels are altered in the progression of CKD, promoting the investigation of the metabolome of interest in nephrology. Previously, a series of nontargeted or targeted metabolomics studies were mostly conducted in a non-Chinese population-based study, and a few new biomarkers have been ascertained [16][17][18][19][20]. The goal of our study was to identify and replicate novel and known metabolites that reproducibly associate with mGFR and try to reveal the characterization of biomarkers, to obtain additional insights into the pathophysiology of CKD in the Chinese population and to reveal the discrepancy among different races.

Study Participants
A total of 198 participants (96 females, 48.5%), including 10 healthy volunteers and 188 CKD patients with varying degrees of renal dysfunction, were enrolled in this study. Mean age was 58.2 ± 18.5years (range 18-96years). The mean body mass index was 24.2 ± 4.2 kg/m 2 (range 15.0-48.6 kg/m 2 ). The mean serum creatinine was 1.82 ± 2.01 g/l (range 0.42-10.79 mg/l). Cystatin C 1.17 ± 1.16mg/L(range 0.62-5.82mg/L). Plasma samples were obtained from Kiang Wu Hospital, Macao, and serum samples were obtained from the Third A liated Hospital of Sun Yat-sen University, Guang Zhou, China. All samples were stored at -80°C for this study. A total of 198 participants (53 serum samples and 145 plasma samples) based on their measured GFR (by iohexol plasma clearance) were selected for untargeted metabolomics assays (N ¼ 214) conducted at the Calibra-Metabolon Joint Laboratory (Hangzhou, China) using Metabolon's HD4 Discovery untargeted metabolomics platform in early 2021.
Informed consent was obtained from all participants. This study was approved by the local ethics committee. All volunteers were informed and signed the consent form.

Metabolomic analysis
Sample Accessioning Each sample received was accessioned into the mLIMS system and was assigned by the LIMS a unique identi er that was associated with the original source identi er only. This identi er was used to track all sample handling, tasks, results, etc. The samples (and all derived aliquots) were tracked by the LIMS system. All portions of any sample were automatically assigned their own unique identi ers by the LIMS when a new task was created; the relationship of these samples was also tracked. All samples were maintained at -80°C until processed.

Sample Preparation
Samples were prepared using the automated MicroLab STAR® system from Hamilton Company. Several recovery standards were added prior to the rst step in the extraction process for QC purposes. To remove proteins, dissociate small molecules bound to proteins or trapped in the precipitated protein matrix, and recover chemically diverse metabolites, proteins were precipitated with methanol under vigorous shaking for 2 min (Glen Mills GenoGrinder 2000) followed by centrifugation.

UPLC-MS/MS methods
The resulting extract was divided into ve fractions: two for analysis by two separate reverse phase (RP)/UPLC-MS/MS methods with positive ion mode electrospray ionization (ESI), one for analysis by RP/UPLC-MS/MS with negative ion mode ESI, one for analysis by HILIC/UPLC-MS/MS with negative ion mode ESI, and one sample was reserved for backup. Samples were placed brie y on a TurboVap® (Zymark) to remove the organic solvent. The sample extracts were stored overnight under nitrogen before preparation for analysis.

Data Extraction and Compound Identi cation
Raw data were extracted, peak-identi ed and QC processed using Discovery HD4 hardware and software.
These systems are built on a web-service platform utilizing Microsoft's. NET technologies run on highperformance application servers and ber-channel storage arrays in clusters to provide active failover and load balancing. Compounds were identi ed by comparison to library entries of puri ed standards or recurrent unknown entities.

Discovery HD4
Discovery HD4 maintains a library based on authenticated standards that contains the retention time/index (RI), mass to charge ratio (m/z), and chromatographic data (including MS/MS spectral data) on all molecules present in the library. Furthermore, biochemical identi cations are based on three criteria: retention index within a narrow RI window of the proposed identi cation, accurate mass match to the library +/-10 ppm, and the MS/MS forward and reverse scores between the experimental data and authentic standards. The MS/MS scores are based on a comparison of the ions present in the experimental spectrum to the ions present in the library spectrum.

Statistical Methods and Terminology
Values are expressed as mean ± standard deviation (SD). Group comparisons were carried out with the chi-square or Mann-Whitney test. Mean values and proportions were compared using one-way analysis of variance and chi-square tests, respectively. A signi cance level of p < 0.05 was utilized in all tests, and SPSS-IBM22 for Mac was used for these analyses.

Random Forest
Random forest is a supervised classi cation technique based on an ensemble of decision trees [20]. For a given decision tree, a random subset of the data with identifying true class information is selected to build the tree ("bootstrap sample" or "training set"), and then the remaining data, the "out-of-bag" (OOB) variables, are passed down the tree to obtain a class prediction for each sample. This method is unbiased since the prediction for each sample is based on trees built from a subset of samples that do not include that sample. To determine which variables (biochemical) make the largest contribution to the classi cation, a "variable importance" measure is computed. We use the "Mean Decrease Accuracy" (MDA) as this metric. The MDA is determined by randomly permuting a variable, running the observed values through the trees, and then reassessing the prediction accuracy.

Result
We applied untargeted high-performance liquid chromatography tandem mass spectrometry (HPLC-MS) to determine metabolomics pro les in 198 participants (53 serum samples and 145 plasma samples), and we measured patients' mGFR by iohexol plasma clearance. The study population are grouped by mGFR (Table 1). Both serum and plasma were analyzed separately and combined.
A total of 198 participants (102 females, 51.5%), including 10 healthy volunteers and 188 CKD patients with varying degrees of renal dysfunction, were enrolled in this study. Table 2 shows the clinical characteristics of the population samples studied. There were no group differences in sex distribution, body mass index (BMI), or blood pressure; however, the group with impaired renal function had higher comorbidities (hypertension, diabetes, coronary artery disease prevalence), and the age was older than the group with normal kidney function.

Global Metabolite Determination and Signi cantly Altered Biochemicals
The present dataset comprises a total of 1094 compounds of known biochemical properties. A subset of these metabolites was identi ed, with signi cant differences in accordance with CKD stage progression (p < 0.05); an additional set of metabolites was identi ed that approached signi cance (0.05 < p < 0.10) (Tables 3-5). Analysis by two-way ANOVA identi ed biochemicals exhibiting signi cant interaction and main effects for experimental parameters of disease status and sample type. An estimate of the false discovery rate (q-value) is calculated to take into account the multiple comparisons that normally occur in metabolomic-based studies.

High Level of Metabolite Overview
Principal component analysis (PCA) is a mathematical dimension reduction procedure that allows differences across a large set of variables to be represented as a smaller set of variables. PCA permits visualization of how individuals, within a group, cluster with respect to their data-compressed principal components. As such, this tool aids in determining whether samples segregate based on differences in their overall metabolite signature. As shown in (Supplement Figs. 5-8). There is complete separation between the samples.

Identi cation of TOP ranking metabolite Changes
Analysis is an unbiased and supervised classi cation technique based on an ensemble of many decision trees. In addition to producing a metric of predictive accuracy, random forest analysis also produces an associated list of biochemical rankings in order of their importance to the classi cation scheme. In this study, random forest analysis was used to identify metabolites that differentiated samples from the four groups. A predictive accuracy of 80.8% was obtained in the combined plasma + serum data set (Fig. 3), 82.1% for the plasma data set (Fig. 1), and 84.9% for the serum data set (Fig. 2), compared to 25% by random chance alone. The random forest analysis performed much better than that expected by random chance, suggesting that there are signi cant metabolic differences that can be used to discriminate samples between the four groups, with metabolites in the amino acid and nucleotide super pathways being of most importance for the three models. A list of the top 30 biochemicals that contributed to the separation of the groups.
A large number of metabolomics related to mGFR were identi ed, and we selected the top 30 metabolites that were bound up with mGFR by the random forest method. As shown in Fig. 3, we found that 15 amino acids, 8 nucleotides, and 2 carbohydrates were strongly related to kidney function in the combined group (serum and plasma). Thirteen amino acids, 9 nucleotides, and 3 carbohydrates were identi ed in the plasma group (Fig. 1), while 13 amino acids, 7 nucleotides, and 3 carbohydrates were found in the serum group (Fig. 2). We observed that 10 of the top 15 ranked metabolites were concordant between the plasma and serum groups.

Panel of markers related to renal function
In addition to the classical clinical markers of kidney dysfunction, other small molecule metabolic markers of kidney function have been studied. In this study, in addition to creatinine and urea, a few potential biomarkers were identi ed, such as pseudouridine. Speci cally, a novel negative biomarker of kidney disease may be 1,5-anhydroglucitol (1,5-AG). As one function of the kidney is to remove toxins and excess metabolites from the body, kidney dysfunction can result in the buildup of many endogenous and exogenous metabolites. One method the body uses to facilitate elimination is phase II metabolism, which generally increases hydrophilicity and urinary elimination. Large increases in sulfated and glucuronidated metabolites, both endogenous and exogenous, were observed in this study. This is most notable with metabolites in the aromatic amino acid, benzoate, food components, and chemical subpathways (Fig. 4).

Discussion
Kidneys are the organs ltering waste products from blood, and creating urine and kidney function is frequently measured as the glomerular ltration rate (GFR). Recently, advances in mass methodology have allowed comprehensive studies of metabolomics and its relationship with kidney function [21][22][23][24][25]. Metabolomics studies can identify and quantify all metabolites present in a given sample, covering hundreds to thousands of metabolites. Thus, data such as the heatmap and plots, as well as principal component analysis and random forest analysis, are performed on the entirety of the data set, the plasma-only data set, and the serum-only data set. Major differences in metabolite pro les in the various severities of CKD were observed. A large number of biochemicals increased with the progression of CKD; on the other hand, a small number of biochemicals were reduced. These differences may reveal stagespeci c biomarkers of CKD.
In this study, we found a number of metabolites associated with mGFR, and we selected the top 30 metabolites that were strongly related to mGFR by the random forest method,As shown in (Fig. 1-3), Here, we nd 10 of the top ranking 15 metabolites substances were concordant between plasm group and serum group .This indicates that the results of serum samples and plasma samples for the determination of metabolomics may be generally consistent;however, still need more studies to verify. Sekula et al. [26] reported 56 metabolites that replicated as associated with eGFRcr, including 6 metabolites that were consistently strongly correlated with eGFRcr (pseudouridine, cmannosyltryptophan, N-acetylalanine, erythronate, myo-inositol and N-acetylcarnosine). However, Coresh J et al [16] reported a few candidate novel ltration markers of metabolites in a panel including pseudouridine, acetylthreonine, myoinositol, phenylacetylglutamine and tryptophan and high correlation with mGFR (including all of the above metabolites except N-acetylcarnosine).
In our research, we found that C-glycosyltryptophan (also known as C-mannosyltryptophan) pseudouridine, N-acetylalanine, erythronate, myo-inositol and even N-acetylcarnosine were highly correlated with mGFR, except acetylthreonine. N-acetylcarnosine and pseudouridine showed markedly increasing levels with increased nephropathy in our study, consistent with the results reported in Sekula et al. [26]. Both metabolites can be indicators of protein turnover as N-acetylation of amino acids.
Pseudouridine is a derivative of uridine and is a modi ed nucleoside found in RNA. Interestingly, pseudouridine may be an ideal biomarker ranking in the top 5 among the above studies, meaning it is a stable indicator and nondependent on race.
Additionally, both N6-carbamoylthreonyladenosine and hydroxyasparagine were unique in this study. Hydroxyasparagine, known as β-hydroxyasparagine (beta-hydroxyasparagine), is associated with mGFR and CKD and is a modi ed asparagine amino acid. However, we know little about this metabolite. It appears in posttranslational modi cations of EGF-like domains that can occur in humans and others Eukaryotes. The modi ed amino acid residue is found in brillin-1 [27]. C-glycosyltryptophan was identi ed to be associated with mGFR and CKD, as well as with prospective endpoints of eGFR decline, incident CKD and ESRD [26].
Speci cally, a potential negative biomarker of kidney disease is 1,5-anhydroglucitol (1,5-AG). 1,5-AG. In the kidney, 1,5-AG is ltered by the glomerulus, and the majority is reabsorbed in the proximal tubule back to the blood. Glucose, which trended higher in the moderate and severe nephropathy groups than in the normal kidney function group, is a competitive inhibitor of this reabsorption. Thus, if glucose rises, more 1,5-AG is excreted in the urine, lowering blood levels. The level of 1,5-AG was decreased in severe nephropathy compared to the three other groups and in moderate nephropathy compared to mild nephropathy. In a recent study, 1,5-AG may also have prognostic value in relation to cardiovascular events in patients with coronary heart disease[28].
Creatine kinase catalyzes both the transfer of a high-energy phosphate from ATP to creatine and the regeneration of ATP from creatine phosphate and ADP. In solution, creatine slowly and spontaneously cyclizes to creatinine, which is eliminated in the urine and can be used as a marker of kidney function. Creatinine levels increase with increasing nephropathy. The urea cycle is necessary for organisms to detoxify and safely eliminate waste ammonia generated from the catabolism of amino acids. Urea also increases with increasing nephropathy.
In addition to removing waste from the organism, another function of the kidney is to regulate the electrolyte and uid volume for the body. Thus, with nephropathy, derangements in molecules necessary for osmotic regulation are expected. As seen in Fig. 4, increases in small molecules involved in osmotic regulation, such as inositol metabolism, myo-inositol and chiro-inositol, erythronate, and trimethylamine N-oxide (TMAO), were observed with increasing nephropathy. Other metabolites, such as 3-indoxyl sulfate, phenylacetylglutamine, 1-methylguanidine, and guanidinosuccinate, have been shown to be uremic toxins or metabolites that accumulate during uremia. All three of these metabolites show increasing concentrations with increased nephropathy. Guanidine (G), 1-methylguanidine (MG), and 1,1dimethylguanidine (DMG) have long been implicated as uremic "toxins." N-acetylalanine-aminopeptidase is a new enzyme found in human erythrocytes. Enzymatic activity has not been found in the cytosolic compartment of highly puri ed human leucocytes [29]. Its physiological function in erythrocytes is still unknown.
Tryptophan, an essential aromatic amino acid, can feed into a number of processes, including production of the neurotransmitter serotonin, the anti-in ammatory metabolite kynurenine, and downstream of kynurenine, NAD + production. An enzyme that catalyzes the rst step in the conversion of tryptophan to kynurenine is tryptophan 2,3-dioxygenase (TDO), which is primarily expressed in the liver. Additionally, changes in tryptophan metabolites can also re ect increasing in ammation: indoleamine 2,3dioxygenase (IDO), which also catalyzes the conversion of tryptophan to kynurenine, is activated by proin ammatory cytokines (e.g., IFN-γ and TNF-α). While kynurenine is produced in this way by an in ammatory process, it has an anti-in ammatory function, serving as a brake on the immune response. Multiple metabolites in the tryptophan metabolic pathway, including kynurenine, kynurenate, anthranilate, and xanthurenate, were increased in the severe and moderate nephropathy groups, with some metabolites increased in the mild group compared to the normal group, indicative of increased in ammation (data not shown). C-glycosyltryptophan results from a posttranslational modi cation of tryptophan by linking a sugar by a carbon-carbon bond [30][31]. Certain posttranslational modi cations, such as carbamylation, have been associated with different chronic conditions, including CKD [32]. Studies in humans showed consistently increased levels of C-glycosyltryptophan in people with decreased eGFR [33][34][35]. Moreover, studies of humans and animal models suggested Cglycosyltryptophan to be a good biomarker of kidney function with more favorable properties than serum creatinine [35,36].
In our study, numerous xenobiotics and metabolized xenobiotics and other metaboletes were observed. Some of note are differences in therapeutic drugs. These drugs, which can have signi cant systemic effects, are a confounding factor in the data.

Conclusion
The differences in the blood plasma and serum metabolome of human subjects with increasing levels of nephropathy, as well as healthy control subjects, were examined. Overall, the changes in the metabolome between the four groups were signi cant. In the PCA, there was group separation of the plasma and serum samples with orthogonal separation due to the experimental group. Our study identi ed 6 novel and potential metabolites that reproducibly strongly associate with mGFR, including pseudouridine, Cglycosyltryptophan, erythronate, N-acetylalanine, myo-inositol, and N-acetylcarnosine, but not acetylthreonine. However, pseudouridine may be an ideal biomarker that is nondependent on race. In addition, both N6-carbamoylthreonyladenosine and hydroxyasparagine are strongly associated with mGFR and CKD and are unique in this study. Speci cally, a potential negative biomarker of kidney disease may be 1,5-anhydroglucitol (1,5-AG). In this study, serum samples and plasma samples for the determination of metabolomics may be generally consistent, although slightly different; however, more samples are needed to con rm this hypothesis. Future studies will utilize the potential 3-5 novel biomarkers in estimating the glomerular ltration rate without race input.

Declarations
Ethics approval and consent to participate The protocol was approved by the institutional review board of the Kiang.
Wu Hospital (KWH 2018-001). All volunteers were informed and signed the consent form. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication
All authors have read the manuscript and consent for publications.

Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors have declared that no competing interests exist.

Funding
This work was supported by The Science and Technology Development Fund, Macau SAR (File no. 0032/2018/A1).

Author Contributions
HQP and XL conceived the study. HQP CWAI, TT, and TYT collected the data and carried out the experiment. HQP and ZL performed the data analyses and veri ed the analytical methods. Both HQP and XL contributed to the nal version of the manuscript. All authors have read and approved the manuscript. Random forest analysis of plasma from subjects with normal kidney function, mild nephropathy, moderate nephropathy, and severe nephropathy. Random forest analysis of plasma from subjects with normal kidney function, mild nephropathy, moderate nephropathy, and severe nephropathy.