2.1. Study design and population
This research was designed as a nested case-control study involving 348 participants from the "Tianjin Medical University Chronic Diseases Cohort". The cohort was established in 2006, with an initial number of 2,068 people for an annual physical examination. By the end of 2018, a total of 21,750 people had been recruited to the cohort, with the longest follow-up period of 13 years. We had collected demographic markers, laboratory markers, and genotyping results for 110 loci (including 380 cases with genome-wide genotyping data).We screened patients met the following criteria: (i) with a follow-up period of at least 5 years; (ii) no CKD at the first physical examination; (iii) having blood samples and other important information among whom 1804 were eligible, 116 were selected as the case group and 232 were selected as the control group with gender and age±3 years matching, therefore a total of 348 subjects were included.
This study has been reviewed and approved by the Ethics Committee of Tianjin Medical University, and all participants have signed informed consent.
2.2. Diagnostic criteria
The diagnostic criteria for CKD were eGFR < 60ml/ (min·1.73m2) or positive proteinuria (≥1+). Glomerular filtration rate is estimated using the simplified Chinese MDRD equation[26]. The determination of diabetes mellitus (DM) is based on the diagnostic criteria of diabetes published by the World Health Organization (WHO) in 1999 that fasting plasma glucose ≥7.0 mmol /L and/or 2 h postprandial glucose ≥11mmol/L. Obesity is defined as body mass index (BMI)≥28kg/m2 according to the recommendation of "Guidelines for the Prevention and Control of Overweight and Obesity among Chinese Adults" [27] by the Ministry of Health. Hypertension were defined as systolic blood pressure (SBP) ≥140mmHg and/or diastolic blood pressure (DBP) ≥90mmHg or a self-reported history of physician-diagnosed hypertension. The diagnostic criteria for hyperuricemia (HUA) [28] were blood uric acid level ≥420 μmol/L in men and ≥360 μmol/L in women.
2.3. Measurements of biomarkers
After twelve hours’ fasting, participants’ venous blood samples were collected into non-anticoagulant blood collection tubes at 7:30-9:00 am, standing at room temperature for half an hour and then centrifuged at 3000 rpm at 4 °C for 10 min to separate serum. The serum was stored at −80 °C before analysis. Levels of fasting plasma glucose, serum creatinine, urea nitrogen, serum uric acid, total cholesterol, triglyceride, alanine aminotransferase, total protein, albumin, globulin, total bilirubin, direct bilirubin were determined using Hitachi automatic biochemical analyzer. Cystatin C (CysC), transforming growth factor beta (TGF-β), asymmetric dimethylarginine (ADMA) and neutrophil gelatinase-associated lipocalin (NGAL) were measured by ELISA (enzyme linked immunosorbent assay) kits (Shanghai Huyu Biotechnology Co., LTD).
2.4. Selection of CKD-related non-genetic/genetic risk factors
We incorporated 21 potential risk factors including several biomarkers into the univariate Cox proportional hazard mode (Supplementary, Table S3), and then significant factors were taken as explanatory variables and incorporated into multivariate Cox proportional hazard regression model, finally we got five non-genetic risk factors (Table 2; Figure 1)
After obtaining part of the data access of the UK-Biobank database, we used PLINK to perform genome-wide association analysis (GWAS) for renal function related indicators, including eGFR, SCr, and CysC. The results of the GWAS were shown in the Manhattan plot (Supplementary, Figure S1). Combined with the results of previous studies, a total of 10 SNP loci on 10 genes were screened (Supplementary, Table S1). Meanwhile, after integrating information from GWAS databases, the UCSC Genomic bioinformatics Database, and GWAS results for kidney function related phenotypes in Asia or China [29-31], SNP loci with both high Genotype Relative Risk (GRR) and genome wide polygenetic score (GPS) for CKD were selected. Finally, we selected a total of 27 SNP loci from 24 genes to construct of the genetic risk model for CKD (Supplementary, Table S2). The 27 SNPs that selected in this study were genotyped in 348 nested case-control subjects using a matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) platform. Hardy-Weinberg equilibrium (HWE) were checked for all 27 SNPs, we deleted 2 SNPs which failed HWE, therefore, genotyping data for 25 SNPs were documented.
2.5. Developing prediction models
In this study, genetic risk score (GRS) models and non-genetic risk score (NGRS) models were built from the weights of natural logarithms (β) of different risk factors’ OR values. The combined effects of each non-genetic or genetic factors were calculated in a weighted way, and the optimal combination method was selected to develop the prediction model of CKD. GRS equation was established based on the different contributions of each candidate SNP site to the pathogenesis of CKD. Each SNP site was considered as a potential risk factor for CKD. Different weights for the contribution to the onset of CKD were determined by different OR (or β) values from logistic regression analysis to establish several combinations and screen for the optimal combination. Using a weighted-genetic risk score(wGRS), wGRS=å1iβiGi (βi is the weight of the ith SNPs, and Gi is the number of alleles at the ith SNPs, assign a value of 0,1,2), the weight is the natural logarithm of the odds ratio (OR) of SNPs, it could be an estimated effect (β coefficient). For each individual, wGRS is the sum of the number of risk alleles weighted by the OR (β) value of each SNP site in Logistic regression. See Formula (1) for details.

In the above formula, to fix the weight in advance, we used the value of log-converted single-risk alleles in studies with large sample sizes and high reliability (e.g., meta-analysis) as the weight in the actual model construction.
The building principle of the non-genetic risk score model is the same as that of the GRS. That is, according to the different contributions of the identified CKD-related non-genetic risk factors (e.g. normal high value of TGF-β, the elderly) to the incidence of CKD, different OR (or β) values of logistic regression analysis are used to determine different weights for the onset of CKD, establish different combinations and select the optimal combination. The weighted non-genetic risk score (wNGRS) was used, wNGRS=å1iβiSi (βi is the weight of the ith corresponding non-genetic risk factor in the risk of developing CKD, and Si is the ith corresponding non-genetic risk factor), the weight β takes the natural logarithm of the OR value obtained by logistic regression analysis of different risk factors. For every individual, the wNGRS is the sum of risk factors weighted by the OR (β) value of different non-genetic risk factors in logistic regression, See Formula (2) for details.

In the above formula, S represents the set vector of a group of non-genetic risk factors (Si represents the state of the ith non-genetic risk factors, if the individual has the risk factor, the value is 1; if not, the value is 0). The β value used in this study was the β value of each non-genetic risk factor in Logistic regression analysis.
The construction of the comprehensive risk scoring model integrates the optimal GRS model and the NGRS model, which is the sum of the two models. See formula (3) for details.

2.6. Prediction model evaluation
The evaluation of the constructed GRS model, NGRS model and comprehensive predictive model adopts the receiver operating characteristic curve (ROC) area under curve (AUC) method. The MedCalc software was used to determine the optimal cut point of the ROC curve and the sensitivity and specificity at the optimal cut point. Finally, the evaluation of the prediction effectiveness of the constructed CKD prediction model is realized. The constructed GRS model, NGRS model and comprehensive prediction model were internally validated in a nesting case-control study using Bootstrap five-fold cross-validation. All data analyses were performed using SPSS21.0 software. Statistical significance was determined with a threshold P value of <0.05.