An Early Prediction Model for Chronic Kidney Disease

Background: Identifying individuals predisposed to chronic kidney disease (CKD) is crucial for intervention and treatment, however, risk equations were relatively rare among the middle-aged and elderly in China. Methods: Eventually, 116 CKD patients and 232 their healthy counterparts from “Tianjin Medical University Chronic Disease Cohort (N=21750)” were included in this study. One nested case-control set (N = 348; followed up for at least ve years) was used to develop and internally validate the predictive model. The clinical, demographic and laboratory features were collected and subjected to logistic regression analyses. Using natural logarithm odds ratio value weighting methods, several risk factors (genetic and non-genetic) were selected to construct predictive models. The nal comprehensive prediction model is the arithmetic sum of the two optimal models. Results: We found that transforming growth factor-β (TGF-β), and asymmetric dimethylarginine (ADMA) were effective biomarkers for CKD, the area under the curve (AUC), specicity and sensitivity for the non-genetic equation were 0.889, 0.851 and 0.770, respectively, in the genetic equation, 0.643, 0.794 and 0.838, respectively, and 0.894, 0.827, 0.801, respectively, in the comprehensive prediction model. After internal verication of Bootstrap, its AUC value was 0.820, indicating it showed a favorable predictive performance. Conclusions: A comprehensive prediction model was established, which may help early identify individuals who are most likely to develop CKD. Its feasibility in the clinic warrants further investigation. and non-genetic risk factors. Although many prediction models reached high prediction power in a relatively large population, an early prediction (at least when eGFR > 60ml/ (min·1.73m 2 ) is essential for CKD treatment and prevention. In this study, we developed genetic, non-genetic (including biomarkers), and comprehensive risk score prediction models for CKD in a nested case-control study.


Introduction
Chronic kidney disease (CKD), especially its complications, has posed a serious threat on public health worldwide. According to statistics from the Global Burden of Disease Study [1], in 2017, nearly 697.5 million cases of all-stage CKD were recorded, for a global prevalence of 9.1%, posted an increase of 29.3% from 1990. Meanwhile, the global all-age mortality rate from CKD increased by 41.5% between 1990 and 2017. A cross-sectional study showed that the prevalence of chronic kidney disease in China was about 10.8% [2], which means that there were approximately 119.5 million CKD patients in China.
So far, certain risk factors are highly associated with chronic kidney disease, including age [3], being female [4], obesity [5], and Diabetes Mellitus [6]. Recently, several biomarkers were found that associated with CKD. A few previous researches have shown that the elevated ADMA (asymmetric dimethylarginine) level could cause renal damage [7] [8]. Several studies have pointed that ADMA is a powerful biomarker for predicting CKD mortality [9,10]. It has also shown that NFAL (neutrophil gelatinase-associated lipocalin) expression levels appear to correlate with the degree of renal dysfunctions, which may help to identify patients at high risk for a more rapid decline in renal function [11]. Furthermore, the decrease of serum CysC (Cystatin C) is correlated with the decrease of eGFR (estimated glomerular ltration rate) concentration [12]. It has been speculated that CysC could be used together with serum creatinine as a new biomarker, or as a substitute for serum creatinine to better identify the occurrence of kidney disease in general population [13,14]. TGF-β (Transforming growth factor-β) is the main regulator of tubular interstitial brosis [15], meanwhile, TGF-β signaling can in uence a few important renal injury responses in other growth factor signaling pathways [16,17], ultimately affecting the onset of CKD [18]. Previous studies have reported that more than 50 single nucleotide polymorphisms (SNPs) were associated with renal function indexes or CKD worldwide [19].
The treatment of chronic kidney disease and renal failure is costly and rarely effective. Whereas, less than 5% of patients with early CKD report awareness of their disease [20]. Once CKD can be diagnosed, glomerular damage has reached over 50% and usually irreversible. Effective prediction of chronic kidney disease can be immensely useful in this aspect. Therefore, several CKD prediction models for different populations were [21][22][23][24] introduced. Lately, a research developed equations for predicting CKD based on 34 multinational cohorts [25]. Nevertheless, few models considered both genetic and non-genetic risk factors. Although many prediction models reached high prediction power in a relatively large population, an early prediction (at least when eGFR > 60ml/ (min·1.73m 2 ) is essential for CKD treatment and prevention. In this study, we developed genetic, non-genetic (including biomarkers), and comprehensive risk score prediction models for CKD in a nested case-control study.

Study design and population
This research was designed as a nested case-control study involving 348 participants from the "Tianjin Medical University Chronic Diseases Cohort". The cohort was established in 2006, with an initial number of 2,068 people for an annual physical examination. By the end of 2018, a total of 21,750 people had been recruited to the cohort, with the longest follow-up period of 13 years. We had collected demographic markers, laboratory markers, and genotyping results for 110 loci (including 380 cases with genome-wide genotyping data).We screened patients met the following criteria: (i) with a follow-up period of at least 5 years; (ii) no CKD at the rst physical examination; (iii) having blood samples and other important information among whom 1804 were eligible, 116 were selected as the case group and 232 were selected as the control group with gender and age±3 years matching, therefore a total of 348 subjects were included. This study has been reviewed and approved by the Ethics Committee of Tianjin Medical University, and all participants have signed informed consent.

Diagnostic criteria
The diagnostic criteria for CKD were eGFR < 60ml/ (min·1.73m 2 ) or positive proteinuria (≥1+). Glomerular ltration rate is estimated using the simpli ed Chinese MDRD equation [26]. The determination of diabetes mellitus (DM) is based on the diagnostic criteria of diabetes published by the World Health Organization (WHO) in 1999 that fasting plasma glucose ≥7.0 mmol /L and/or 2 h postprandial glucose ≥11mmol/L. Obesity is de ned as body mass index (BMI)≥28kg/m 2 according to the recommendation of "Guidelines for the Prevention and Control of Overweight and Obesity among Chinese Adults" [27] by the Ministry of Health. Hypertension were de ned as systolic blood pressure (SBP) ≥140mmHg and/or diastolic blood pressure (DBP) ≥90mmHg or a self-reported history of physician-diagnosed hypertension. The diagnostic criteria for hyperuricemia (HUA) [28] were blood uric acid level ≥420 μmol/L in men and ≥360 μmol/L in women.

Measurements of biomarkers
After twelve hours' fasting, participants' venous blood samples were collected into non-anticoagulant blood collection tubes at 7:30-9:00 am, standing at room temperature for half an hour and then centrifuged at 3000 rpm at 4 °C for 10 min to separate serum. The serum was stored at −80 °C before analysis. Levels of fasting plasma glucose, serum creatinine, urea nitrogen, serum uric acid, total cholesterol, triglyceride, alanine aminotransferase, total protein, albumin, globulin, total bilirubin, direct bilirubin were determined using Hitachi automatic biochemical analyzer. Cystatin C (CysC), transforming growth factor beta (TGF-β), asymmetric dimethylarginine (ADMA) and neutrophil gelatinase-associated lipocalin (NGAL) were measured by ELISA (enzyme linked immunosorbent assay) kits (Shanghai Huyu Biotechnology Co., LTD).

Selection of CKD-related non-genetic/genetic risk factors
We incorporated 21 potential risk factors including several biomarkers into the univariate Cox proportional hazard mode (Supplementary,   Table S3), and then signi cant factors were taken as explanatory variables and incorporated into multivariate Cox proportional hazard regression model, nally we got ve non-genetic risk factors (Table 2; Figure 1) After obtaining part of the data access of the UK-Biobank database, we used PLINK to perform genome-wide association analysis (GWAS) for renal function related indicators, including eGFR, SCr, and CysC. The results of the GWAS were shown in the Manhattan plot (Supplementary, Figure S1). Combined with the results of previous studies, a total of 10 SNP loci on 10 genes were screened (Supplementary, Table S1). Meanwhile, after integrating information from GWAS databases, the UCSC Genomic bioinformatics Database, and GWAS results for kidney function related phenotypes in Asia or China [29][30][31], SNP loci with both high Genotype Relative Risk (GRR) and genome wide polygenetic score (GPS) for CKD were selected. Finally, we selected a total of 27 SNP loci from 24 genes to construct of the genetic risk model for CKD (Supplementary, Table S2). The 27 SNPs that selected in this study were genotyped in 348 nested case-control subjects using a matrix-assisted laser desorption ionization time-of-ight mass spectrometry (MALDI-TOF-MS) platform. Hardy-Weinberg equilibrium (HWE) were checked for all 27 SNPs, we deleted 2 SNPs which failed HWE, therefore, genotyping data for 25 SNPs were documented.

Developing prediction models
In this study, genetic risk score (GRS) models and non-genetic risk score (NGRS) models were built from the weights of natural logarithms (β) of different risk factors' OR values. The combined effects of each non-genetic or genetic factors were calculated in a weighted way, and the optimal combination method was selected to develop the prediction model of CKD. GRS equation was established based on the different contributions of each candidate SNP site to the pathogenesis of CKD. Each SNP site was considered as a potential risk factor for CKD. Different weights for the contribution to the onset of CKD were determined by different OR (or β) values from logistic regression analysis to establish several combinations and screen for the optimal combination. Using a weighted-genetic risk score(wGRS), wGRS=å 1 i β i G i (βi is the weight of the ith SNPs, and Gi is the number of alleles at the ith SNPs, assign a value of 0,1,2), the weight is the natural logarithm of the odds ratio (OR) of SNPs, it could be an estimated effect (β coe cient). For each individual, wGRS is the sum of the number of risk alleles weighted by the OR (β) value of each SNP site in Logistic regression. See Formula (1) for details.
In the above formula, to x the weight in advance, we used the value of log-converted single-risk alleles in studies with large sample sizes and high reliability (e.g., meta-analysis) as the weight in the actual model construction.
The building principle of the non-genetic risk score model is the same as that of the GRS. That is, according to the different contributions of the identi ed CKD-related non-genetic risk factors (e.g. normal high value of TGF-β, the elderly) to the incidence of CKD, different OR (or β) values of logistic regression analysis are used to determine different weights for the onset of CKD, establish different combinations and select the optimal combination. The weighted non-genetic risk score (wNGRS) was used, wNGRS=å 1 i β i S i (β i is the weight of the ith corresponding non-genetic risk factor in the risk of developing CKD, and S i is the ith corresponding non-genetic risk factor), the weight β takes the natural logarithm of the OR value obtained by logistic regression analysis of different risk factors. For every individual, the wNGRS is the sum of risk factors weighted by the OR (β) value of different non-genetic risk factors in logistic regression, See Formula (2) for details.
In the above formula, S represents the set vector of a group of non-genetic risk factors (S i represents the state of the ith non-genetic risk factors, if the individual has the risk factor, the value is 1; if not, the value is 0). The β value used in this study was the β value of each nongenetic risk factor in Logistic regression analysis.
The construction of the comprehensive risk scoring model integrates the optimal GRS model and the NGRS model, which is the sum of the two models. See formula (3) for details.

Prediction model evaluation
The evaluation of the constructed GRS model, NGRS model and comprehensive predictive model adopts the receiver operating characteristic curve (ROC) area under curve (AUC) method. The MedCalc software was used to determine the optimal cut point of the ROC curve and the sensitivity and speci city at the optimal cut point. Finally, the evaluation of the prediction effectiveness of the constructed CKD prediction model is realized. The constructed GRS model, NGRS model and comprehensive prediction model were internally validated in a nesting case-control study using Bootstrap ve-fold cross-validation. All data analyses were performed using SPSS21.0 software.
Statistical signi cance was determined with a threshold P value of <0.05.

Results
In this nested case-control study, 348 participants (all had eGFR ≥ 60ml/ (min·1.73m 2 ) at baseline) were included (116 cases, 232 controls, subjects who reached eGFR < 60ml/ (min·1.73m 2 ) during the 5-year follow-up were considered as "cases") to build a 5-year risk prediction model for the onset of CKD. Baseline characteristics of the included participants in the nested case-control study were described in Table 1.
The levels of fasting plasma glucose (FPG), total cholesterol (TC), urea nitrogen (BUN), serum creatinine (SCr), total protein (TP), globulin (GLB), systolic blood pressure (SBP), Cystatin C (CysC), transforming growth factor-β (TGF-β), and asymmetric dimethylarginine (ADMA) in the CKD group were signi cantly higher than those in controls. The age of CKD group was signi cantly higher than that of the non-CKD group, incidences of type 2 diabetes and hyperuricemia were higher than that of on the non-CKD group (Table 1). In addition, triglyceride (TG), serum uric acid (SUA) and body mass index (BMI) levels in CKD group were higher than those in non-CKD group, but there were not statistically signi cant.

Non-genetic risk factors for CKD
Cox proportional risk regression model showed that age, diabetes mellitus, normal high value of urea, normal high value of TGF-β, and ADMA were independent risk factors for CKD ( Table 2; Supplementary, Table S3). Kaplan-Meier survival analyses showed that the elderly, normal high value of urea nitrogen, normal high value of TGF-β, normal high value of ADMA, and diabetes (we de ned age ≥ 60 years as the elderly, took the higher quartile of other measurement data as their normal high values) were signi cantly associated with chronic kidney disease onsets in our cohort (Fig. 1).

Non-genetic risk score (NGRS) prediction model for CKD
A total of 5 predictors, including age, diabetes mellitus, normal high value of BUN, normal high value of TGF-β, and ADMA were included in the non-genetic prediction model for CKD. Among four models (Supplementary Material S1;

Genetic risk score (GRS) prediction model for CKD
By integrating the results of CKD-related genetic loci in the UK-Biobank subjects (Supplementary, Table S1) and previous studies, 25 SNPs were analyzed for their correlation with CKD by Logistic regression analysis (Supplementary Table S7 S2+0.84×S3+0.497×S4+0.603×S5. The predictive power of CKD comprehensive prediction model was higher than either of the nongenetic and genetic prediction models: the AUC was 0.894 (95% CI, 0.857-0.931), OR was 3.758 (95% CI, 2.827-4.997), while the sensitivity was 0.827 and the speci city was 0.801 (Table 3, Figure 2).

Internal validation
In the nested case-control study, Bootstrap ve-fold cross validation was carried out for different prediction models of CKD onset. After veri cation results were averaged, the AUC values of non-genetic, genetic, and comprehensive prediction models of CKD were 0.786, 0.692, and 0.820, respectively.

Discussion
Early prediction of CKD is challenging. Decades of researches showed that diabetic nephropathy, primary glomerulonephritis, hypertension, interstitial nephritis, and polycystic kidney could all induce CKD. The awareness of CKD is notoriously low, once CKD had developed, usually the treatment was limited until last remedies of dialysis and renal transplantations are needed for ESRD. The eGFR is a sensitive indicator of renal function, however, it is not an early predictor of CKD. Although many biomarkers were tested for CKD, reappraisal in large sample sized prospective cohort studies are needed. Seeking an early, sensitive, easy to perform and cost-effective prediction model We carried out a nested case-control study for CKD prediction out of the "Tianjin Medical University Chronic Disease Cohort" [32,33], with strong pertinence, facilitated prediction of the 5-year probability of chronic kidney disease onset in this area. The average age of subjects was 63 years, thus, those individuals are more likely to develop CKD than younger subjects.
We combined traditional laboratory indicators, multiple biomarkers that related to renal function, and SNP loci to develop CKD prediction models. In the NGRS model, we not only included some indicators which were used in other studies such as diabetes and age [25,34,35], several biomarkers, especially TGF-β and ADMA, were also employed as early CKD predictors in the model.
Although hundreds of associations were found among CKD and susceptibility genes, large sample-sized GWAS also yielded very signi cant results, seems genetic factors only provided a little improvement of the prediction model. Given a certain SNP, the genetic relative risk (GRR) of could be high, however, its contribution to CKD risks in general population was limited. All 17 SNPs employed in our study were from GWASs out of UK Biobank and other large cohorts, however, the AUC of the genetic risk model (GRS) was only 0.643, had only given a marginal improvement of AUC in the comprehensive model (from 0.889 to 0.894). A study in Japan showed that genetic predictors do not contribute signi cantly to the improvement of the prediction e ciency of the comprehensive prediction model (29). Although certain SNPs had very signi cant associations with CKD in large sample sized GWASs (i.e., high genetic relative risk, GRR), their contribution to the phenotype variance might be limited.
Several biomarkers were tested and included in our prediction model. The plasma TGF-β level, alone with ADMA, provided better prediction value than the more direct glomerular ltration indicator cystatin C. In our previous study, we found that TGF-β pathway genes were highly expressed in kidney of very early stage of diabetic nephropathy renal biopsies, long before renal brosis and decreased ltration occurred.
Indeed, screening early biomarkers before eGFR decreasing may give CKD predictions several years earlier, although early treatment could be another obstacle to overcome.
Recently, numerous predictive models have been established and came into use in the clinic for decision-making. Among them, there exist several models estimating the risk of prevalent and incident CKD [22,[34][35][36][37]. However, due to differences in race, lifestyle, and geographic environment, it's still necessary to develop an effective predictive model for chronic kidney disease in different ethnic groups, which can help to identify people with higher CKD risks earlier, thus to improve health care by allocating resources to those individuals who bene t most from it, while preventing the potential abusing of health care resources by individuals who at low risks.
This study has a few limitations. First, the research on CKD-related biomarkers was carried out in a nested case-control study that selected from a cohort of chronic diseases, the sample size was relatively small, therefore results from the study may had certain deviations; Second, our risk prediction model only focused on the onset of chronic kidney disease, but did not assess the progression of chronic kidney disease to renal failure or other complications; Third, participants who make up the "Tianjin Medical University Chronic Disease Cohort" were mostly teachers and government employees that worked in urban area. This group of people were more self-disciplined and paid more attention to health. Whether our prediction model could be applied to other groups of people needs more external validations. Our future studies will detect more renal function related biomarkers in larger cohorts in order to validating and improving the prediction model for CKD.

Conclusions
Page 7/14 Age, diabetes, normal high value of creatinine, TGF-β and ADMA are independent indicators for CKD. Besides, an integrated predictive model was also established to evaluate individuals' risk of CKD at an early stage, therefore, early and appropriate intervention can be exerted to avoid getting worse and even irreversible.