Study population
There were significant differences between development datasets and external validation in some baseline characteristics. As it is shown in the table1, Blood lipid including CHOL, LDL and APOA were lower in development and internal validation set compared with the external validation set (4.82±1.44 vs 4.98±1.41, P=0.049; 2.95±1.14 vs 3.09±1.18, P=0.037; 1.28±0.25 vs 1.31±0.25, P=0.026; respectively). And the population in development and internal validation set used more glucocorticoids for treatment (P=0.04). mGFR, age, serum creatinine and other characteristics did not differ significantly.
GFR estimation models performance
Of all the variables, creatinine, cystatin C, weight, BMI, age, UA, BUN, HCT and APOB were selected by the RFE approach. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. 9-variable random forest regression model was optimal. In the 9-variable model, random forest regrssion model was better than revised linear regression in terms of bias, precision, 30%accuracy and RMSE (0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, precision and RMSE improved compared random forest regression model with revised regression model (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001, respetively). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).
Table1. Characteristics of participants
characteristic
|
Overall
(1732)
|
Modeling group (n=1333)
|
External validation group(n=399)
|
P value
|
Number of participants
|
1732
|
1333
|
399
|
-
|
gender(male)[n(%)]
|
1047(60.5%)
|
805(60.4%)
|
242(60.7%)
|
P = 0.925
|
mGFR(ml/min/1.73m2)
age(years old)
|
87.01±37.6
57.15±13.56
|
87.38±38.18
57.03±13.60
|
85.80±35.75
57.53±13.42
|
P = 0.463
P = 0.520
|
SBP(mmHg)
|
137.67±21.45
|
137.14±21.55
|
139.47±21.05
|
P = 0.057
|
DBP(mmHg)
|
79.90 ± 13.52
|
79.60±13.38
|
80.90±13.94
|
P = 0.078
|
BMI(Kg/m2)
|
24.57±3.60
|
24.50±3.61
|
24.78±3.53
|
P = 0.185
|
mGFR(ml/min/1.732)
|
87.01±37.63
|
87.37±38.2
|
85.80±35.76
|
P = 0.463
|
Hemoglobin(g/L)
|
123.89±23.34
|
124.04±23.07
|
123.38±24.22
|
P = 0.623
|
Hematocrit (%)
|
0.37±0.06
|
0.37±0.06
|
0.36±0.07
|
P = 0.616
|
MCV(fL)
|
85.80±7.77
|
85.73±7.93
|
86.03±7.20
|
P = 0.504
|
MCHC(g/L)
|
337.90±18.51
|
338.28±15.21
|
336.61±26.71
|
P = 0.114
|
Kalium(mmol/L)
|
4.04±0.48
|
4.04±0.48
|
4.04±0.48
|
P = 0.951
|
Natriumion(mmol/L)
|
140.59±3.06
|
140.54±3.15
|
140.77±2.74
|
P = 0.184
|
Chloridion(mmol/L)
|
104.08±3.69
|
104.06±3.75
|
104.14±3.46
|
P = 0.672
|
Calcium(mmol/L)
|
2.30±0.15
|
2.30±0.15
|
2.29±0.16
|
P = 0.171
|
Phosphate(mmol/L)
|
1.23±0.25
|
1.23±0.25
|
1.23±0.26
|
P = 0.644
|
HCO3- (mmol/L)
|
23.25±2.93
|
23.20±2.97
|
23.44±2.81
|
P = 0.150
|
ALB(g/L)
|
38.40±4.91
|
38.38±4.86
|
38.46±5.09
|
P = 0.770
|
PA(g/L)
|
248.05±68.70
|
247.86±69.96
|
248.70±65.78
|
P = 0.692
|
FBS(mmol/L)
|
7.51±3.62
|
7.55±3.67
|
7.36±3.42
|
P = 0.348
|
BUN(mmol/L)
|
8.48±5.80
|
8.43±5.82
|
8.63±5.74
|
P = 0.563
|
UA(umol/L)
|
423.18±131.68
|
421.84±134.39
|
427.66±122.22
|
P = 0.439
|
CHOL*(umol/L)
|
4.86±1.43
|
4.82±1.44
|
4.98±1.41
|
P = 0.049
|
HDL.C(umol/L)
|
1.07±0.30
|
1.07±0.30
|
1.09±0.31
|
P = 0.250
|
LDL.C*(umol/L)
|
2.98±1.15
|
2.95±1.14
|
3.09±1.18
|
P = 0.037
|
APOA*(g/L)
|
1.28±0.25
|
1.28±0.25
|
1.31±0.25
|
P = 0.026
|
APOB(g/L)
|
1.19±0.45
|
1.18±0.45
|
1.22±0.44
|
P = 0.203
|
LPA(mmol/L)
|
240.65±265.32
|
239.27±263.84
|
245.27±271.47
|
P = 0.692
|
TRI(mmol/L)
|
1.89±1.76
|
1.89±1.78
|
1.90±1.70
|
P = 0.903
|
Cystatin C(mg/L)
|
1.55±1.1.05
|
1.53±1.02
|
1.60±1.15
|
P = 0.225
|
Creatinine(mg/dl)
|
1.72±1.80
|
1.69±1.73
|
1.81±2.03
|
P = 0.256
|
Medical treatment
|
|
|
|
|
Albumin used the day of GFR measurement
|
20(1.2%)
|
15(1.1%)
|
5(1.3%)
|
P = 0.834
|
Diuretics used the day GFR measurement
|
234(13.5%)
|
181(13.6%)
|
53(13.3%)
|
P = 0.880
|
hormone*
|
35(2.0%)
|
32(2.4%)
|
3(0.8%)
|
P = 0.04
|
Immunosuppressant
|
10(0.6%)
|
8(0.6%)
|
2(0.5%)
|
P = 0.891
|
Uric acid lowering drug
|
198(11.4%)
|
155(11.6%)
|
43(10.8%)
|
P = 0.639
|
Lipid-lowering drugs
|
1045(60.3%)
|
794(59.6%)
|
251(62.9%)
|
P = 0.231
|
EPO
|
77(4.4%)
|
55(4.1%)
|
22(5.5%)
|
P = 0.238
|
Iron supplement
|
149(8.6%)
|
111(8.3%)
|
38(9.5%)
|
P = 0.564
|
Calcium supplement
|
228(13.2%)
|
164(12.3%)
|
64(16.0%)
|
P = 0.053
|
Active vitamin D3
|
167(9.6%)
|
123(9.2%)
|
44(11.0%)
|
P = 0.283
|
Inactive vitamin D3
|
7(0.4%)
|
6(0.5%)
|
1(0.3%)
|
P = 0.582
|
Beta blocker
|
340(19.6%)
|
254(19.1%)
|
86(21.6%)
|
P = 0.270
|
Alpha blocker
|
54(3.1%)
|
44(3.3%)
|
10(2.5%)
|
P = 0.423
|
CCB
|
671(38.7%)
|
518(38.9%)
|
153(38.3%)
|
P = 0.853
|
ACEI
|
83(4.8%)
|
63(4.7%)
|
20(5.0%)
|
P = 0.814
|
ARB
|
866(50.0%)
|
667(50.0%)
|
199(49.9%)
|
P = 0.952
|
Diuretic
|
232(13.4%)
|
182(13.7%)
|
50(12.5%)
|
P = 0.660
|
Other variables
|
|
|
|
|
Smoke
|
259(15.0%)
|
208(15.6%)
|
51(12.8%)
|
P = 0.166
|
Drink
|
159(9.2%)
|
115(8.6%)
|
44(11.0%)
|
P = 0.145
|
URI PRO
|
678(39.1%)
|
520(39.0%)
|
158(40.0%)
|
P = 0.825
|
HBP
|
1062(61.3%)
|
812(60.9%)
|
250(62.7%)
|
P = 0.531
|
Diabetes
|
1299(75.0%)
|
992(74.4%)
|
307(76.9%)
|
P = 0.307
|
CHD
|
344(19.9%)
|
262(19.7%)
|
82(20.6%)
|
P = 0.694
|
Stroke
mGFR≥60ml/min/1.73m2
mGFR<60ml/min/1.73m2
|
49(2.8%)
1298(74.9%)
434(25.1%)
|
43(3.2%)
999(74.9%)
334(25.1%)
|
6(1.5%)
299(74.9%)
100(25.1%)
|
P = 0.069
P = 0.890
P = 0.910
|
Note: Unless noted, categorical variables are expressed as percentages, continuous variables are expressed as mean plus or minus standard deviation; “*”= P < 0.05. Citation abbreviations in the table are described in detail in the acronym table. Low-density lipoprotein cholesterol, HDL-C High-density lipoprotein cholesterol, TRI Triglyceride, CHOL Total cholesterol, BMI Body mass index, CO3- bicarbonate radical, ALB albumin, FBS Fasting blood sugar, BUN Blood urea nitrogen, UA Uric acid, PA Prealbumin, APOA Apolipoprotein A, APOB Apolipoprotein B, HCT Hematocrit , ,PA Prealbumin, LPA Lipoprotein A, TRI Triglyceride, EPO Erythropoietin, CCB Calcium channel blocker, ARB Angiotensin receptor blockers, ACEI Angiotensin converting enzyme inhibitor, URI PRO Urine protein, HBP high blood pressure, CHD chronic heart disease, MCV Mean Corpuscular Volume, MCHC Mean Corpuscular Hemoglobin Concentration,
Table2. Comparison of random forest regression model and revised CKD-EPI model
Model
|
Num of Variables
|
Bias
|
Precision
|
Accuracy(30%)
|
RMSE
|
RF regression
|
9
|
0.78**
(-0.29, 1.66)
|
16.90**
(14.96, 19.39)
|
0.84**
(0.80, 0.88)
|
16.88**
(15.41, 18.89)
|
Revised regression
|
9
|
2.98**
(1.00, 5.38)
|
23.62**
(20.76, 26.20)
|
0.80**
(0.75, 0.83)
|
18.70**
(17.31, 20.48)
|
RF regression
|
4
|
0.34
(-1.71, 1.71)
|
20.82**
(18.16, 23.36)
|
0.80
(0.76, 0.84)
|
19.08**
(17.46, 22.15)
|
Revised regression
|
4
|
2.07
(-1.01, 4.25)
|
25.25**
(22.50, 28.26)
|
0.78
(0.74, 0.82)
|
20.60**
(19.06, 22.50)
|
|
|
|
|
|
|
|
Notes: Bias = median difference (95%CI), Precision = IOR of the difference (95%CI), Accuracy30 = 30% accuracy (95%CI), RMSE = Root Mean Squared Error (95%CI), ‘*’ means P<0.05, and ‘**’ means P<0.01 comparing with RF with linear regression. RF random forest, RMSE root mean square error.
Nine variables: Creatinine, cystatin C, age, weight, BMI, UA, BUN, HCT and APOB
Four variables: Creatinine, cystatin C, age, sex