We proposed an AI/ML model that predicts an individual's brucellosis infection only using gender, age and CBC component data. A recent related study applied ML to develop a model to predict a future diagnosis of lung cancer based on routine clinical and laboratory data [27]. Because less input features are, more easier representative data is available, to improve the accuracy and generalization of the model. To our knowledge, this is the first published study that sets up a AI/ML model only using regular laboratory results to predict brucellosis infection as a clinical decision support tool.
In model optimization, the influence of variables, target populations, and ML algorithms were considered. First, basic individual demographic information, CBC parameters and their inflammation-associated ratios (NLR and PLR) easily obtained were utilized for feature selection. We choose CBC results because they are utilized in a variety of medical contexts and more standardized than those of other tests. The degree of variation in patients can be reduced by using simple features, hence improving the general applicability of the prediction model. That two ratio parameters derived from CBC results were introduced, we intended to enhance the model's prediction ability and learning efficiency to a certain extent. As usual, higher PLR and NLR levels were statistically associated with diseases and their clinical courses, such as a infection like COVID-19 [28] and can indicate with greater sensitivity to patients’ inflammatory response and individual immune state [29]. Second, three prediction models with distinct case groups were evaluated in order to select the ideal case group for the prediction model. We tested the collinearity between variables and removed highly correlated variables, thus reduced impact on the each model’s goodness-of-fit. As a result, we obtained their respective variables in three models with different case groups, 22 variables in model A, 16 variables in model B, and 21 variables in model C (Supplemental Figure.1). Finally, on the basis of three distinct case groups, the performance of 21 prediction models developed by seven ML algorithms was compared. The GBM algorithm in Model B demonstrated the highest accuracy (AUC = 0.997, 95%CI 0.994–0.999), with highest specificity/sensitivity of 89.6%/99.8% and positive predictive value (PPV)/negative predictive value (NPV) of 99.4%/96.7% as well, Fig. 3. The best model for predicting performance incorporates patients in the acute phase as the case group; perhaps CBC results characteristics are more conspicuous in disease.
At the same time, MU together with SHAP, is used for improving model’s explainability. Figure.4a interpreted the degree of contribution of each variable in the predication model from up to down. Features in blue indicates that their values are less likely to predict the outcome while values of those in red are more likely to be predictive. BASO# was the most contributed feature in our model. Figure 4b revealed an increased chance of brucellosis infection in patients who had low BASO#, MONO%, R.CV and high RBC, MCV. Figure 4c, could help us to determine the contribution of each variable to the prediction outcomes of different patients. While the MU of BASO%, EO#, EO% was revealed in Table 2, the three features accounting for a large proportion of all variables. It may be patient-related, instrument-related or operation-related and so on. It meant that the three features would affect the accuracy and generalization of our model if included. However, we found that, in our optimal model B, EO% was removed from the stage of feature selection. And that although the features of BASO% and EO# were kept, Fig. 4a showed the two features were less contributed to our model. Thus, the influence of each feature for the accuracy and the generalization of our model is visualized by combining SHAP and quantitative MU evaluation in metrology.
There are two potential limitations to the use of this model. First, the training data set was from a patient cohort who were for moderate to life-threatening presentations in one largest infectious disease hospital in North China. Thus, the model may not be appropriate to patients of mild brucellosis or outside North China. Second, the model was developed with a "control group" of ill patients in a territory hospital with apparently characterized normal outcomes, and there may be potentially non-healthy people who will affect the characteristics of the model.
Generally speaking, an ideal training set for a learning-based approach should cover the variability of samples across different demographic and geographic distributions, as well as co-morbidities, facilities (e.g., ED, inpatients, out-patient clinics) and to follow their changes over time. In practice, any training set collected within a fixed time period and inaccessibility, so the representativeness of data cannot satisfy all these wishes. The deployment of software in medical scenarios cannot be achieved by one stop. The US Food and Drug Administration published a white paper [30] last year particularly discussing how to properly regulate the adaptations/ modifications of AI/ML models as a medical device. It is necessary to implement a continuous learning process that involves model monitoring, updating, and customization. The MU in metrology can quantitative evaluate and visualize of the quality and variability of data sources. A metrology-driven AI/ML-based model offer a new strategy for making software of this kinds adopt in any complex medical scenarios.