Differential Diagnosis of β-Thalassemia Trait from Iron Deficiency 1 Anemia: Application of Bayesian Decision Tree

: 1 2 Background: Several discriminating techniques have been proposed to discriminate 3 between β ‐ thalassemia trait (βTT) and iron deficiency anemia (IDA) so far. These 4 discrimination techniques are important clinically, but they are challenging and 5 normally difficult; so if a patient with IDA is diagnosed as βTT, then it is deprived of 6 iron therapy. This study is the first application of the Bayesian tree-based method for 7 differential diagnosis of βTT from IDA. 8 Method: In this study, 907 patients were enrolled with the ages over 18-year-old with 9 microcytic anemia. Bayesian Logit Treed (BLTREED) has been used to discriminate 10 βTT from IDA. 11 Results: Mean corpuscular volume (MCV) was found as the main predictor in 12 diagnostic discrimination. BLTREED model showed high sensitivity (96%), specificity 13 (93%), accuracy (95%), Youden's index (89), as well as positive and negative 14 predictive values in the differential diagnosis of βTT from IDA. Also, AUC revealed a 15 more precise classification with an area under the curve value of 0.98. 16 Conclusions: BLTREED model showed excellent diagnostic accuracy for differentiating βTT from IDA. In addition, understanding tree-based methods are easy and need not a statistical experience, so this advantage can help physicians in making 19 the right clinical decision. Thus, we suggest the using of the BLTREED model as a powerful method in data mining techniques in order to develop sensitive and accurate diagnostic methods for for discriminating between these two anemia disorders.

Highlight 1 2 1-To the best of our knowledge this study is the first application of Beysian 3 tree-based method for differential diagnosis of βTM from IDA.  3-The proposed model will support medical decisions for differential diagnosis of βTM from 9 IDA to avoid much more expensive, time-consuming laboratory tests the 10% prevalence was reported (1). To prevent iron overload and its complications 20 caused by misdiagnosis and inaccurate treatment, and also determining the necessity 21 of prenatal investigations for hemoglobin chain disorders, it is important to differential 22 βTT from IDA (2). Hemoglobin electrophoresis, serum iron and ferritin levels are 23 considered to make a definitive differential diagnosis between βTT and IDA (  5  -3  ) . 24 However, to reduce costs related to diagnostic workup, various major studies have  Recently, the accessibility of powerful statistical software has provided the application 10 of data mining techniques for health-related data. Many studies have been proposed 11 advance statistical methods and data mining techniques such as decision Trees 12 methods (68) for differential diagnostic between βTT and IDA to avoid much more based methods for constructing a differential diagnosis scheme and investigating the 24 performance of several tree-based methods for the differential diagnosis of βTT from 25 IDA. Decision Trees have advantages over traditional statistical methods like 1 discriminant analysis, generalized linear models (GLMs) and survival analysis. The 2 main advantage of tree-based methods is tree structure that makes it easy to interpret 3 the clinical data and to be accepted by medical researchers and clinicians. But these 4 methods suffer from greediness problem and this problem have disadvantages like: 5 limit the exploration of tree space, dependence future splits to previous splits, generate 6 optimistic error rates and the inability of the search to find a global optimum (71).

7
Bayesian tree approaches are proposed to solve the greediness problem of tree-8 based methods. Also, these Bayesian approaches can quantify uncertainty and these 9 approaches explore the tree space more than classic tree approaches. Bayesian 10 approaches combine prior information with observations unlike classic tree methods

11
(these methods use only observations for data analysis   One of the advantages of these methods is the graphical presentation of results that 21 make them easy to interpret and no need for statistical experience for understanding X (Y|X ~ f(Y|X, θ i )) and also by fitting sophisticated model at terminal nodes (by fitting 23 logistic regression model for data prediction in each terminal nodes), smaller trees and more interpretable were generated. In BLTREED model, one subset of X can use to 1 generate the tree and another subset can use for fit models in terminal nodes (these 2 subsets can be joint and or disjoint). In this Bayesian approach θ i = B i shows the set 3 of regression coefficients for the logistic model fitted in ith terminal node. 4 The recursive stochastic process using a tree-generating stochastic process for tree 5 growing (p(T)) 6 is as follow (73, 74): 7 8 1-Start from T that has only a root node (terminal node η). 9 2-Calculate the probability for splitting node η as follow: Where, d η is the depth of the node η , α is the base probability of tree growth 12 of splitting a node, and β is the rate which determine the propensity to split 13 decreases with increased tree size.
14 Actually (α & β) are parameters that control the shape and size of trees and 15 these parameters provide a penalty to avoid over-fitting model.  Posterior distribution function p(T|X, y) was computed with combining the marginal 1 likelihood function p(Y|X, T) and tree prior p(T) as follows: Where p(Y|X, T) is as follow: Which p(y|X, Θ, Τ), (y ih , x ih ) and n i show the data likelihood function, observed values 10 for hth observation in ith node and the number of observations in ith node, respectively.

5
The tree structure of BLTREED model was shown in figures 1. The first split of the tree 6 was based on MCV, it showed that MCV is an important predictor in differentiation 7 between the types of hypochromic microcytic anemia. Another predictor which used 8 as second splitting variable in tree structure was HB. According to the presented tree, 9 four homogenous sub-groups were extracted from data which obtained four diagnostic 10 discrimination rules for differentiating between βTT and IDA (Table 2). This classifying 11 scheme showed that values of MCV ≤ 72.70 screening the βTT patients.

12
Predictive performance of the model in differentiation between βTT and IDA calculated 13 based on confusion matrix (Table 3). The obtained tree showed the highly TPR, TNR, 14 PPV, NPV, Youden's Index, accuracy and F-measure in differentiation between βTT 15 and IDA (Table 4).

16
In addition, the model has NLR < 0.1 and it could be concluded that BLTREED has 17 good diagnostic accuracy for discriminating the patients. it showed that the patient with βTT has lower values of MCV.

5
In previous studies that used the different conventional decision trees for differential 6 diagnosis βTT from IDA, the first split of all algorithms was based on MCV and they 7 also concluded that MCV was an important predictor variable in discrimination of IDA Youden's index and F-measure showed that BLTREED has good diagnostic accuracy 13 for discriminating the patients. It was truly classified 96% of βTT patient. Furthermore,

14
AUC as an index of overall performance showed excellent and significant accuracy 15 (99, 98) in training and test data, respectively in differential diagnostic of βTT and IDA.

16
Other studies that used different data mining techniques and decision trees based on 17 frequentist approach of fitting revealed the high performance and accuracy but lower QUEST, GUIDE and CRUISE for differential diagnosis of βTT from IDA. They 7 indicated that CRUISE algorithm has the best diagnostic performance with similar to 8 and the present study, but this classic algorithm uses greedy algorithm for tree 9 generating and cannot explore the tree space more than Bayesian tree approaches. 10 Also, many studies compared the diagnostic performance of hematological     The authors declare that they have no competing interests.      Matos JF, Dusse L, Borges KB, de Castro RL, Coura-Vital W, Carvalho MdG.

1
A new index to discriminate between iron deficiency anemia and thalassemia trait.