The Effect of Ferritin on Arteriovenous Fistula Survival in Hemodialysis Patients: Analyzing Using Data Mining Technique.

Background: End stage renal disease (ESRD) need either kidney transplant or dialysis in order to stay alive. Hemodialysis (HD) requires vascular access (VA) for access to the bloodstream. Arteriovenous stula (AVF) is the top choice amongst vascular access procedures based on reduced morbidity and mortality. Follow up of AVF is vital for detecting factors which contribute to failure or survival of Fistula. Methods: This study is aimed to investigate follow up of 113 HD patients who underwent AVF surgery from 2015 to 2018 at Hasheminejad kidney center of Tehran. Using a predictive approach the investigation was conducted to determine the relationship between biomarkers such as serum ferritin, SI / TIBC ratio, hemoglobin (HB) and stula function. The decision tree method was used based on CHAID algorithm, which is one of the data mining approaches for data classication. Results: The decision tree for the measures taken indicated the ferritin and SI/TIBC class in which the stula had a better survival rate. Prediction accuracy rate was within a range of 62.50%-66.07%. Conclusions: AVF survival rate in ESRD patients who had SI/TIBC<30% or a serum ferritin level of lower or equal to 200 ng/ml, specically when their HB was greater than 8, were better.


Introduction
End stage renal disease (ESRD) is the nal and adverse stage of chronic kidney disease (CKD), where the kidneys are no longer able to function appropriately to meet the needs of daily life. The most common treatment for ESRD is hemodialysis (HD) and requires permanent vascular access (VA) which makes connection between HD patient's blood circulatory system and dialysis machine [1]. There are three main types of VA used in HD procedure: native arteriovenous stula (AVF), synthetic arteriovenous graft (AVG), and central venous catheter (CVC), among which AVF is the most prevalent since it enjoys fewer complications, morbidity, mortality and superb survival rate in comparison with other types of VA [2]. Hence, it is recommended by different countries' guidelines [3]. Despite the advantages cited about stula, AVF failure due to complications such as venous stenosis and thrombosis, is one of the main causes of hospitalization and morbidity among HD population [4]. Thus, any action which intensi es these complications should be avoided.
Anemia is one of the most common complications of chronic kidney disease, especially among ESRD patients, which is de ned as a result of de cient production of renal erythropoietin and as a consequent escalates the risk of AVF failure [5]. Fistula survival rate in severe anemic patients (HB<8 g/dl) is low.
However, in patients with HB levels between 8-10 g/dl and 10-12 g/dl or HB>12 g/dl, no signi cant risk factors threatens their stula survival [6]. Despite the studies on the relationship between HB and survival of stula, there has been no investigation on the effect of biomarkers such as serum ferritin and SI/TIBC ratio on survival of stula Serum ferritin is an indicator of body iron storage [7] , [8] and screens iron therapy in chronic renal disease patients [9], [10]. In ammatory conditions namely chronic disease, infection, cancer as well as endogenous and exogenous iron, can alter ferritin levels [11], [12]. High levels of serum ferritin cause atherosclerosis in HD patients, especially when ferritin level is >500 ng/ml [13]. The study showed that highest mortality risk is associated with both major rise in HD patients with baseline ferritin>200 ng/ml and modest rise in those with baseline ferritin>800 ng/ml [14].
Iron saturation (ISAT), is the ratio of serum iron (SI) and total iron-binding capacity (TIBC), which is one of the markers used to indicate iron de ciency or overabundance. studies show low ISAT (less than 20-24%) bring about higher mortality [15,16] and ISAT 35-50% lead to lower mortality rate in maintenance HD patients [17].
Data mining is de ned as the process of extracting knowledge, or unknown and valid patterns from large data sets [18]. Data mining applications in healthcare-related studies is growing progressively, as it provides in-depth analysis of both voluminous and complicated data, which is di cult to be processed via traditional methods. Data mining methods are being used today as effective tools in the eld of medical diagnosis, outcomes prediction and decision making processes [19,20].
Due to the importance of detecting factors which contribute to the failure or survival of AVF, for the rst time, one of data mining techniques called decision tree has been utilized to discover the effect of biomarkers such as SI/TIBC ratio, ferritin and hemoglobin on survival of stula among ESRD patients who followed them up.

Ethics statement
This study was approved by the Ethics Committee of Hasheminejad Clinical Research Development Center. All patients initially provided informed consent for access to their medical records for research purposes. All the methods were performed in accordance with the determined guidelines and regulations.

Settings
In this study the archived clinical data of ESRD patients who underwent AVF surgery from 2015 to 2018 at Hasheminejad Kidney Center (HKC) in Iran has been followed up by using predictive approach.
Patients' status were analyzed in two condition: during surgery and after surgery.

Participants and Variables
The following inclusion and exclusion criteria was employed for 300 patients who had undergone radio cephalic AVF (Cimino stula) creation (Figure1). Seventy of these patients could not be followed up and also 67 of the population were expired and subsequently there were no records about their AVF status. Thus, they were excluded. In addition,50 patients who had over 50% missing value, were eliminated from processing to limit the data noise. As a result, merely 113 of the original 300 patients were analyzed.
Survival of arteriovenous stula was considered as a target variable, which is classi ed in 2 groups (Table1) based on their medical records and consultation of surgeons. At initial stages of study, attributes such as ESR, CRP, RBC, WBC, PT, PTT, INR, HB, SI, TIBC and serum ferritin were evaluated for these patients. Out of these parameters, PTT, PT, INR, WBC and RBC on account of no signi cant effect on target variable, and also ESR and CRP due to 80% missing value, were excluded from the processing.
The involved variables such as serum ferritin were categorized into 3 subgroups as shown in Table 2 with respect to the recommendation of international guidelines for controlling of iron de ciency anemia (IDA) in CKD which mention that the upper limit of serum ferritin should be maintained at <500-800 ng/ml [10,[21][22][23][24][25][26][27][28][29]. Moreover, other study groups in Europe and US recommend, ferritin should be kept at 400-600 ng/ml [30] and 200-1200 ng/ml in HD patients [17]. Due to the availability and applicability of the population in this study, the limited cutoff level of HB could be analyzed. In this regard, HB was classi ed into only two groups (Table3) regarding the HB levels proposed by NKF-DOQF guidelines (11-12 g/dl) [31] and studies which had detected the relation between AVF failure and HB<8 g/dl, rather than other three strata (8-10, 10-12 and >12 g/dl) [6]. SI/TIBC classi cation (Table4) has been selected based on the recommendation of Kidney Disease Outcome Quality Initiative (K/DOQI) of the National Kidney Foundation [32] and the Best Practice Guidelines of the European Renal Association [23] which included cutoff level of 20-50% for ISAT. Table 1   Table 2   Table 3   Table 4 Data Mining Process Real data in large data bases and data warehouses usually encounter three complications that are: incomplete, noisy and inconsistent data. Thus, preprocessing is a pivotal step in knowledge discovery in data bases (KDD) because high-quality data will lead to a high-quality decision [18]. In this regard, the construction of data base and also both data cleaning and integration were done in IBM SPSS Statistic version 22. Decision tree was used in Rapid miner studio version 9 for analyzing the data. This was done because when there is no linear relationship between the attributes and for better understanding of the effect of variables on each other -such as determining the range of variable data impact on target variable-the rule mining and decision tree can assist us in analyzing.
Different decision tree (DT) algorithms can be used to classify data, whose target variable can include patients' nal status. With the aim of the rules extracted by decision tree, the data mining system can be trained to learn the rules that controlled the end state of patients. Then, the system can be asked by applying the rules to make prediction on patients whose nal status has not been provided to the system [33]. A decision tree structure is like a owchart which consists of three parts: 1) the nodes, which indicate test on attribute value; and the highest node in tree is the root node; 2) branches that signify the test outcomes, 3) and nally, leaves which represent class or class distribution [18]. We used decision treebased Chi-squared Automatic Interaction Detection (CHAID) which is based on the chi-squared attribute relevance test [34].
The confusion matrix was used to measure the accuracy of the prediction method. Such representation is usually used for supervised algorithms, where each row of the confusion matrix represents the true value, but each column contains the predicted sample [35].
After preprocessing the data, CHAID decision tree was plotted respectively in Figures 2, 3, 4, 5 by Rapid miner studio version 9. Also, in all four states, AVF survival was the target attribute.

Results
A total number of 113 ESRD patients were followed up 46 of whom had a lower than 1 year AVF function, and 67 had an AVF function of greater or equal to 1 year. There were 74% male and 26% female with a mean age of 51.91 years. The impact of biomarkers such as serum ferritin, SI/TIBC ratio and HB on AVF survival of these patients was analyzed.
In the rst state (Fig 2), the SI/TIBC ratio, serum ferritin and HB was considered with respect to the classi cation mentioned in Tables 2, 3 and 4. DT1 on Fig 2 demonstrates SI/TIBC attribute as the root of this tree -the derived rules are mentioned in Table5-which indicates that AVF function in ESRD patients with SI/TIBC≥0.3 (AVF survival=57%) is less than in the case of patients with SI/TIBC<0.3 (AVF survival=66%). The accuracy, precision and recall rates of the classi cation in DT1 is respectively equal to 66.07%, 67.50%, 81.82% based on its confusion matrix (Table 6). Table 5   Table 6 In the second condition which is described in Fig3, the SI/TIBC attribute was set aside, and DT2 represents serum ferritin variable as the root node, and its corresponding laws are illustrated in Table7. With regard to the extracted rules and considered strata for serum ferritin, AVF of ESRD patients with Ferritin≤200 ng/ml, performed superior to other two groups (AVF survival=70%); and in second class (200-500 ng/ml) which is considered as a normal stage for serum ferritin, patients' AVF with HB>8 functioned more effective than those with HB≤8 g/dl. Moreover, the AVF function of the third class of serum ferritin (>500 ng/ml) is less effective than the rst category (AVF survival=66%). The accuracy rate measured for the classi cation in DT2 is equal to 62.50% and also the precision and recall rate are 62.24% and 92.24% respectively, which is shown in confusion matrix of DT2 (Table 8). Table 7   Table 8 In the third case, according to the results obtained from the second state, the serum ferritin variable was classi ed in two subgroups (≤200, >200 ng/ml) to discover what results the new interval would produce. DT3 (Fig 4) depicts the SI/TIBC variable as a root node. Due to the corresponding laws of DT3 which is illustrated in Table9, AVF of ESRD patients who had SI/TIBC<0.3, run in a more appropriate manner (AVF Survival=65%) than patients with SI/TIBC≥0.3 (AVF Survival=56%). Furthermore, regarding the SI/TIBC≥0.3, in the rst strata of serum ferritin (≤200 ng/ml), individuals with HB>8 g/dl had better AVF function than patient with HB≤8 g/dl. And in the second class of serum ferritin (>200 ng/ml), AVF did not function successfully. The accuracy, precision and recall rates of DT3 confusion matrix (Table10) are respectively 64.29%, 65.48%, 83.33%. Table 9   Table 10 In the fourth condition, based on previous states, HB variable was excluded to investigate its outcome on result. SI/TIBC attribute was indicated as root node in DT4 ( Fig 5) and according to its extracted rules cited in Table11, the results are same as preceding states; patients with SI/TIBC≥0.3 have less AVF function (AVF survival=33%) than those with SI/TIBC<0.3 (AVF survival=65%). Also considering the SI/TIBC≥0.3, in the rst class of serum ferritin (≤200 ng/ml) patient's AVF performed better than patients with serum ferritin>200 ng/ml. The accuracy, precision and recall rates of DT4 in confusion matrix are: 62.29%, 64.77%, 86.36% (Table 12).
In nal state we discarded ferritin attribute, but rapid miner could not construct decision tree because of insu cient data. Table 11   Table 12 Discussion In this study, with the aim of data mining procedures, the effect of biomarkers such as serum ferritin, SI/TIBC ratio, hemoglobin on AVF survival were evaluated. The outcome of this process was four decision trees. According to the obtained rules of decision trees, it was found out that the ratio of serum iron on TIBC can effect long-term function of stula (Table 5, 9 and 11). To illustrate this point, if this ratio is less than 30%, AVF can have more superior function compared with patients with SI/TIBC≥30%.
There has been no studies which detect the relationship between SI/TIBC and AVF survival, but researches have been carried out on the lowest risk of all-cause mortality in maintenance hemodialysis (MHD) patients with serum ISAT between 30%-50% and lowest risk as a result of cardiovascular (CV) death in the ISAT range of 35-50% [17]. The possible relation between higher iron marker level and de cient cardiovascular outcomes have also been reported [36]. In addition, quite a few of in vivo studies have illustrated the association between higher iron indices and poor result in MHD patients [37].
However, many reports address the adverse effects of iron in dialysis patients according to in vitro studies [38].
Serum ferritin levels, same as SI/TIBC ratio, impact the survival of stula. As cited in the derived laws (Table 7, 9 and 11), a better function of AVF was observed among ESRD patients with serum ferritin≤200 ng/ml particularly when their HB was greater than 8 or their SI/TIBC ratio was less than 30% in contrast with patients who had serum ferritin>200 ng/ml and HB≤8 g/dl. In accordance to the present study, Ishii et al, also detected that ferritin may be corresponded with vascular incidents [36]. However, Morton et al, found no speci c relation between serum ferritin and AVF failure [39]. Moreover, studies have shown the effect of ferritin levels on mortality rate, showing that maintaining serum ferritin levels between 300-800 ng/ml contributes to lower risks of all-causes and cardiovascular mortality of HD patients [40]. Nonetheless, Kim T et al, indicated a relationship between higher mortality risk and a major rise in ferritin of HD patients with baseline ferritin≥200 ng/ml and a minor rise in ferritin of patients with baseline ferritin≥800 ng/ml [41].

Conclusion
The present study had some limitations to detect more attributes that may have contributed to improving AVF performance. It is recommended to be repeated with a greater size. Finally, the achievements of this study give a better functionality to the experts in hospitals and help surgeons realize in what level biomarkers such as Ferritin and SI / TIBC contribute to more effective function of ESRD patient's AVF and which interval susceptible to AVF failure or thrombosis. As a consequence, they are able to take proper preventive methods.

Availability of data and materials
The data that support the ndings of this study are available from HKC but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of HKC.

Author contributions
A.M collected data and wrote the manuscript. A.M and A.A contributed to design and analyze model and also data interpretation under supervision of M.KH. M.KH and M.R revised the manuscript. M.KH took responsibility that this study has been reported honestly, accurately, and transparently; that no important aspects of the study have been omitted. The nal format of the paper were read and approved by all the authors.
Please see the supplementary les section to view the tables. Figure 1 The inclusion and exclusion process for ESRD patients in this study.