A Nomogram for Predicting the Risk of Tuberculosis Infection

Background: Tuberculosis (TB) has become one of the main causes of deaths worldwide. Because of certain conditions prevent the early TB diagnosis and treatment to some extent. This study aimed to develop a tuberculosis (TB) infection risk model and validate the ability of nomogram to predict risk for TB infection in a Chinese population. Methods: A prediction model based on the training dataset of 272 patients was established. Minimum absolute shrinkage and selection operator regression model were adopted to optimize the feature selection of the TB infection risk model. Using multivariate logistic regression analysis, a predictive model combining the features selected in the minimum absolute shrinkage and the selected operator regression model was constructed. The ability of this predictive model to discriminate and calibrate TB infection risk and its utility in clinical settings were assessed via concordance index (C-index), calibration plot, area under time-dependent receiver operating characteristic curve (AUC), and decision curve analysis (DCA). The clinical practicality of nomogram was evaluated via net reclassication index (NRI) and integrated discrimination improvement (IDI). Bootstrapping validation allowed internal validation. Results: According to this predictive nomogram, the main predictors of TB infection risk were gender, age, smoking history, fever, hemoptysis, fatigue, emaciation, CD8, CD4/CD8, ESR, CRP, and abnormal liver function. The model exhibited superior risk calibration and discrimination with a C-index of 0.737 (95% CI: 0.685–0.789). The internal validation reached a C-index value of 0.688. The predictive model was able to produce an AUC of 0.729 (95% CI: 0.677–0.781). Analysis of the decision curve revealed the TB infection probability nomogram manifested its clinical usefulness on the condition that intervention was decided at the TB probability threshold of 13%. Moreover, results demonstrated that nomogram could be utilized as an effective prognostic tool according to NRI and IDI. Conclusion: The new TB probability nomogram for predicting TB infection risk developed herein that combines various factors, such gender, age, smoking history, fever, hemoptysis, fatigue,


Background
Tuberculosis (TB), which is caused by a single pathogenic bacterium, is one of the main causes of deaths worldwide. Almost a fourth of the world population is infected with Mycobacterium tuberculosis (MTB) [1]. According to the World Health Organization (WHO), the number of new TB cases worldwide in 2018 was estimated to be about 10 million. In recent years, the incidence and mortality of TB in China have gradually decreased, but its prevalence rate remains second only to India; the number of TB cases in China accounts for 9% of the total globally [2]. Undoubtedly, TB is an urgent public health concern. Some of the traditional methods for detecting MTB are etiological diagnosis, such as acid-fast staining smear and culture of MTB. However, these methods require high-quality clinical specimens. Moreover, they have a low detection rate, poor sensitivity, and time consuming; furthermore, their use may result in misdiagnosis or delayed diagnosis [3,4]. These disadvantages prevent the early TB diagnosis and treatment to some extent. Molecular biology examination has recently an important supplement to traditional diagnostic methods. Molecular methods are rapid, accurate, e cient, and have a high ux.
These methods represent a new avenue for TB diagnosis, treatment, and control. However, molecular methods are expensive. Thus, their application in grassroots medical institutions and far-ung areas is severely limited.
TB is primarily transmitted through the respiratory tract. In hospital settings or public places, people should be protected from pathogens. Thus, measures to prevent the airborne spread of TB must be taken to protect people at risk of TB infection. However, recognizing people at high risk of TB infection is di cult. Therefore, a practical risk prediction model for TB must be developed. A nomogram is an easyto-use tool that can create a simple and intuitive prediction model chart. It can be used to quantify the risk of a clinical event by graphically representing the in uence of each prediction factor on the results.
Hence, a reader/clinician can more easily understand the impact of each predictor on the results. A clinician can predict 1-, 3-, and 5-year survival by looking at the sum of all predictors for a given patient [5].
In this study, we identi ed a combination of variables that can be used to accurately predict the risk of TB infection. A nomogram for this purpose was established. This nomogram can aid physicians in diagnosing and applying appropriate TB treatment.

Patients And Methods
Patients A retrospective review of 272 consecutive patients who came to our hospital from January 2014 to December 2014 was conducted. TB diagnosis was based on acid-fast staining smear of MTB. Secondary analysis was ethically permitted by the Xi'an Chest Hospital (Xi'an, China), and all subjects have received informed consent. All patient information was not made public.

Statistics analysis
Count (%) was adopted to express all data comprising demographic, disease, and therapy features. Statistical analysis was performed using the R software (Version 3.1.1; https://www.R-project.org).
Least absolute shrinkage and selection operator (LASSO) regression technique was utilized in selecting the best prediction characteristics in risk factors of patients with TB. Characteristics with nonzero coe cients in the LASSO regression model were selected. Using multivariable logistic regression analysis, a predictive model and a nomogram for TB was developed. The characteristics had been deemed to be odds ratio (OR), which possessed 95% con dence interval (CI) and as P-value. Statistically signi cant levels were re ected in two aspects. The model consisted of sociodemographic variables with a P-value of 0.05, whereas all variables related to disease and therapy features were also covered. Using cohort studies, all possible predictors were applied in establishing a predictive model of TB risk. The calibration of the forecast nomogram was evaluated by plotting calibrated curves. Harrell's C-index was also measured to quantify the prediction nomogram and assess its discriminatory performance. Area under the curve (AUC) was calculated to determine the discrimination ability of the model. Bootstrapping (resampling = 113) was performed to internally validate the model. The accuracy of the nomogram was compared with that of a doctor's judgment on the basis of his or her clinical experience by determining net reclassi cation improvement (NRI) and integrated discriminative improvement (IDI). As alternatives to AUC, NRI and IDI were employed to assess improvements in risk prediction and measure the practicality of this novel model. Decision curve analyses (DCA) were conducted to determine the practicality or feasibility and the advantages of this predictive model.

Results
Patients' characteristics A total of 385 patients, including 243 patients with TB and 142 patients with other diseases, were treated in our hospital from January 2014 and December 2014. About 70% of the patients (272) were randomly selected and analyzed to build the model, and 30% of the patients (113) were evaluated to validate the model. Information of the patients' in the two groups, such as demographics, disease, and therapy characteristics, is summarized in Table 1.  Figures 1A and 1B), only 12 possible predictive factors, namely, gender, age, smoking history, fever, hemoptysis, fatigue, emaciation, CD8, CD4/CD8, ESR, CRP, and abnormal liver function, remained (Table 2). These factors possessed nonzero coe cients in the LASSO regression model. The results of logistic regression analyses of gender, age, smoking history, fever, hemoptysis, fatigue, emaciation, CD8, CD4/CD8, ESR, CRP, and abnormal liver function are given in Table 2. A model incorporating the aforementioned independent predictive factors was constructed and displayed as a nomogram (Figure 2).

Performance of the TB risk nomogram in the cohort
According to the calibration curves, the performance of the infection risk nomogram for TB prediction was excellent and consistent with this cohort (Figure 3). The C-index for the predictive nomogram was 0.737 (95% CI: 0.685-0.789). Bootstrapping proved the C-index was 0.688. These results demonstrated the outstanding discriminative capability of this model. According to the TB risk nomogram, the predictive ability of the model was excellent.
Validation of the nomogram for TB risk prediction The training data were utilized to plot ROC curves. The AUC of the training set was statistically signi cant

Clinical use
The decision curve of the TB nomogram is plotted in Figure 5. According to the decision curve, when the threshold probabilities of patients and doctors were 13% and 83%, respectively. In this scope, the net bene t could be contrasted with multiple overlapping parts according to the TB risk nomogram.

Discussion
Nomogram is widely utilized in predicting tumors and various diseases [6-8]. It relies on a humanized digital interface for high precision and easy understanding of prognosis to facilitate a rational clinical decision-making [5]. Nomogram has higher clinical value than traditional predictive models [9]. In the present study, a nomogram for TB risk prediction was established. It consisted of 12 variables, namely, gender, age, smoking history, fever, hemoptysis, fatigue, emaciation, CD8, CD4/CD8, ESR, CRP, and abnormal liver function. This nomogram is a comparatively precise predictive tool for diagnosing patients with TB. Using an internally validated cohort, results showed that this model has excellent discriminative and calibration ability. In particular, the C-index indicated that the nomogram has a large sample size. Thus, it can be widely utilized in clinical settings.
TB is a prominent public health concern worldwide. According to previous studies, recognition of TB infections remains unsatisfactory. Physicians recognize that over half of patients have active TB, but they still fail to enact preventive measures [10]. As the cost of healthcare increases and as resources become scarce, suspected cases must be promptly and correctly identi ed. Moreover, therapy is commonly postponed for patients with TB who are not subjected to sputum or drug-resistance experiments or for patients with sputum-negative TB. In some cases, patients undergo improper therapy. Several studies examined the factors related to TB infection, some of which have varying degrees of success in terms of risk.
In the present study, a nomogram for TB risk prediction was established on the basis of 12 variables for patients with TB. LASSO regression analysis was performed to lter the variables involved in the nomogram. Clinically important predictive factors were evaluated. Results that smoking history was one of the predictors in the nomogram. Systematic reviews revealed that smoking has an adverse relationship with global TB epidemic [11]. The role of smoking in TB pathogenesis is associated with ciliary dysfunction, decreased immune responses, defects in immunological reactions of macrophages, with or without reduced CD4 count, and increased susceptibility to MTB infection [12]. The TB epidemic disproportionally affects males and females. Borgdorff et al. [13] provided compelling evidence that males are predominantly infected with TB than females. However, other researchers argued that this conclusion is due to reporting bias as TB infection among females may be underreported in developing regions [14]. In the present study, gender was found to be one of the predictors. TB incidence was higher in males than in females probably because most men are the breadwinner of the family. Hence, they are more exposed to TB infection than females. Moreover, the elderly are more susceptible to infectious diseases than younger individuals, especially to respiratory tract infections [15]. According to the WHO Global Tuberculosis Report (2014), the greatest TB infection rates worldwide were recorded in people aged 45-55 years. In the Western Paci c region, Eastern Mediterranean, and Southeast Asia, TB infections are high among the elderly, especially among those over 65 years old [16]. This report was consistent with our nomogram prediction. Kara [17] reported that four clinical factors, including chronic symptoms, such as persistent fever, fatigue, and emaciation, are associated with TB. This nding was consistent with the results of our prediction model. Several laboratory inspection indicators were also inputted into the model.

Declarations
Huan-qing Liu performed statistical analysis and data interpretation, and was responsible for the quality control of data and algorithms. Ting-ting Li performed literature research. Jun Lyn contributed to the study concept and study design. All authors contributed to writing of the manuscript and approved the nal version.

Funding
No funding.
Availability of data and material The datasets analyzed during current study are available from the corresponding author upon reasonable request.  The partial likelihood deviance (binomial deviance) curve was plotted versus log(lambda). Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1 SE of the minimum criteria (the 1-SE criteria). (B) LASSO coe cient pro les of the 24 features. A coe cient pro le plot was produced against the log(lambda) sequence. Vertical line was drawn at the value selected using vefold cross-validation, where optimal lambda resulted in 12 features with nonzero coe cients.   The AUC (representative the discrimination of the model) of the model. Note: The dotted vertical lines indicate the 95% con dence interval. AUC, area under the curve.