Predictive models for Talaromyces marneffei infection in HIV-infected patients using routinely collected data

Late diagnosis of Talaromyces marneffei ( T. marneffei ) in patients with HIV/AIDS infection is strongly associated with greater mortality. To date, effective predictive model for T. marneffe i infection in clinical practice have not been established. We aimed to identify a non-culture-based method for rapid detection of T. marneffei infection in HIV/AIDS patients. Methods The prediction models were initially constructed using patients in a retrospective cohort study. We obtained demographics, clinical and laboratory data for each individual. Univariate comparisons, logistic regression, Random Forest (RF) analysis and receiver-operating characteristic curves (ROC) were used to identify and evaluate the predictive factors of T. marneffei infection status. Results HIV-infected patients with a baseline characterized by weight loss, typical skin lesions, peripheral or abdominal lymphadenopathy (POAL), hepatomegaly, splenomegaly, decrease lymphocyte count, abnormal aspartate aminotransferase (AST) level , higher AST to alanine aminotransferase (ALT) ratio index (AARI) level (>1) and lower (<50 cells/mL) CD4+ T-cell counts had an increased risk of T. marneffei AARI, level and CD4+ T-cell count resulted in good classifiers of T. marneffe i infection by RF analysis. RF model had a relative high power [area under the ROC curve (AUC): 0.859] to predict T. marneffei infection in the present study. A new indicator combine AST level and AARI could increase the classification power of the model (AUC: 0.877).


Conclusion
Our data suggest that accurate assessments for T. marneffei infection can be obtained using routinely collected data of patients with HIV. The prediction model could used to identify HIV patients who currently have early stage of T. marneffei infection, which would be benefit to both patients and clinicians.

Background
Talaromyces marneffei (T. marneffei), previously named Penicillium marneffei [1,2] , is a pathogenic, thermal dimorphic fungus. Systemic mycosis and disseminated infection caused by T. marneffei is often found in patients with secondary immunodeficiency syndromes, especially in patients with AIDS [3] , mainly popular in southeast Asia and southern China. The prevalence of systemic T. marneffei infection has grown up rapidly currently, in consistent with the increasing incidence of HIV infections [3] . The HIV epidemic has transformed T. marneffei from a rare infection to a leading AIDS-defining diagnosis in this region [4][5][6][7] . The proportion of HIV patients infected with T. marneffei is ~ 10% in China currently [3,6] , and approximately 99% of T. marneffei infections were reported from the southern regions, of which 43% were from Guangxi [6] . Thus, infection with T. marneffei is common among HIV/ AIDS patients in Guangxi, ranking fourth among all AIDS-related complications with a 16.1% prevalence [8] .
Previous studies have shown that the mortality rate of T. marneffei-infected HIV/AIDS patients was significantly greater than that of HIV/AIDS patients without T. marneffei infection [8,9] . Late diagnosis is the main reason for the high fatality in HIV/AIDS patients co-infected with T. marneffei, many cases of T. marneffei infection persist undiagnosed until later stages [4,5] . T. marneffei infection has a considerably high mortality (50.6%-91%) if the disease was not diagnosed and treated in time [6,10] .
Antifungal treatment during the early stage of T. marneffei infection can effectively control the disease [9,11] and reduce death from T. marneffei infection [6,12] .
Thus, early detection is critical for improving the prognosis of T. marneffei. However, clinical experiences show that the diagnosis of T. marneffei infection is difficult because its clinical manifestations may mimic other AIDS-related opportunistic infections [13,14] . Microbiological culture is the gold standard method for diagnosis of T. marneffei. However, this method requires prolonged 7-10 days of incubation time to isolate and identify the pathogen from clinical specimens, frequently resulting in the delay of appropriate antifungal therapy. In addition, the sensitivity of fungal culture from blood in HIV infected patients is about 76.7% [15] . Thus, development of non-culture-based method for rapid detection of T. marneffei infection is urgently needed. Recently, a number of other diagnostic methods have been established including different polymerase chain reaction (PCR) methods [16][17][18][19] , high-throughput sequencing of specimens [20] , enzyme-linked immunosorbent assay (ELISA) [21] and Mp1p tests [22] . However, these costly approaches have not been applied in clinical practice with sensitivity ranging from 10-100% [23] .
Notably, recent studies have indicated that accurate risk assessments for diseases progression and clinical outcomes could be obtained for patients using only data routinely collected in clinical practice [24][25][26] . There is barely effective predictive model for T. marneffei infection in clinical practice right now. Therefore, the aim of this study was to analyse correlations among T. marneffei infection and clinical parameters in large-scale HIV infected patients and to predict T. marneffei infection status by using data that are routinely collected in clinical practice. This study has the potential to facilitate rapid diagnosis of T. marneffei infection, thus initiating early therapeutic management of T. marneffei in HIV infected patients possible.

Study Setting and Population
The prediction models were initially constructed using patients in a retrospective cohort study conducted at the People's Hospital of Liuzhou, a large-scale comprehensive tertiary hospital in Liuzhou and a large treatment center for HIV/AIDS in the city. Individuals included in this study were more than 18 years of age, presence of HIV-1 infection, and had culture-proven T. marneffei infection from January 2015 to December 2018. Exclusion criteria were tumors; combined with tuberculosis; hepatitis A, B, C, D, E virus coinfection; liver damage caused by drugs, alcohol, autoimmune hepatitis active opportunistic infections; with evidence of cirrhosis or suspected liver cancer; pregnancy or lactation women; the presence of abnormal renal function, severe heart, lung, brain diseases. infection was diagnosed by isolating T. marneffei from blood, skin scrapings, bone marrow, lymph node, and/or other body fluid samples according to standard culture techniques [3] .

Data collection
Only data from the first T. marneffei admission were extracted from the medical records. Data that are routinely collected in clinical practice included the demographics, clinical characteristics (e.g. fever, cough and skin lesion), and associated laboratory examinations of the patients. We normalized the values of baseline parameters, by log10 transformation of hemoglobin, white blood cell count (WBC), platelets, ymphocyte count, creatinine and urea values. Results of baseline labs abstracted also included CD4 T-cell count, pathological biopsy, aspartate aminotransferase (AST) and alanine aminotransferase (ALT). Using these values, AST to ALT ratio index (AARI) was calculated. Then, data from the medical records were independently doubleentered into a Microsoft Access 2013 database specially defined for this study to ensure data integrity.

Machine learning methods, predictive models construction and validation
Random forest (RF) analysis, a decision tree-based ensemble statistical method, was applied as multivariate statistics to identify the variables that best partitioned the overall study population according to T. marneffei infected status. RF is an ensemble learning method provides an unbiased selection of variables that make the largest contributions to the classification. The mean decrease in accuracy (MDA) variable was used to evaluate the RF interpretations, which estimates how much excluding (or permuting) each variable reduces the accuracy of the model during the out of bag error calculation phase.
Finally, the variables considered important discriminators of T. marneffei infection were selected, obtained from the multivariate comparisons (P < 0.05) or the RF analysis (with larger MDAs in the classification model). RF and logistic regression were used for analysis and further diagnosis models.
Receiver-operating characteristic (ROC) curves were generated to quantify how accurately the variables were able to discriminate between the groups. Models were evaluated by constructing the confusion matrix for test data. Model performance was compared using the area under the receiver operating characteristic curve (AUC) analysis and 95% confidence intervals (CI). For RF analysis, we replicated 500 times in order to obtain the mean and 95% CI for this AUC and validated each of these models on all the data within this cohort. In addition, sensitivity and specificity were also measured for each model.

Statistical analysis
A descriptive analysis of patients' characteristics was carried out using frequency tables for categorical variables and continuous variables. Data were presented in numbers (%) as appropriate.
Normally distributed continuous variables were described as mean ± standard deviation, nonnormally distributed continuous variables described as median and interquartile range (IQR). To find differences between the two groups, we compared their differences in demographic, clinical and laboratory characteristics at baseline using different methods. Univariate comparisons were performed using the nonparametric Mann-Whitney test or student's t-test for continuous variables and the Pearson's Chi-square test or Fisher's exact test for categorical variables. Variables with a pvalue < 0.05 from the univariable analysis were then tested in multivariable models using forward stepwise procedures. A two-sided test at a p-value of < 0.05 was considered statistically significant.   platelets, lymphocyte count, but AST, ALT and AARI were significantly higher in T. marneffei infected patients than those in non-T. marneffei infected patients. Therefore, the results suggest that these baseline variables favoured T. marneffei infection. Predictive models by machine learning method RF analysis was used as a multivariate method to evaluate the classificatory value of T. marneffei infection determinations in baseline characteristics (Fig. 2). The variable importance graph for the model predicting is shown in Fig. 2A. RF algorithm confirmed that skin lesions, AARI, POAL, AST, and CD4 + T-cell count were good classifiers in a ranked list of classification scheme ( Fig. 2A). Moreover, it is important to highlight that these variable were also the most significant variable either in the univariate test or logistic regression, which suggested that they are powerful indicators of T. marneffei infection over time.
Assessing and comparing models performance Finally, we evaluated the potential clinical usefulness of our highlighted candidates. Accordingly, logistic regression and RF algorithm were further used to construct predictive models with the involvement of parameters. Sensitivity and specificity of each method are shown in Table 4, and ROC curves are plotted in Fig. 2B. Table 4 Specificity and sensitivity of classification and AUC of predictive modules based on two methods. correctly classify T. marneffei infection in this study. As expected, the AUC value increased by 0.018, specificity increased by 5.0% and it did improve the percntage of classification (Fig. 2D) (Table 4).

Discussion
Early and accurate diagnosis of T. marneffei infection is crucial to initiate the proper management and treatment in clinical practice, especially for HIV infected patients.
Accurate risk assessments for T. marneffei infection among HIV patient represent the major strength of our study. For predictive models, two modules were conducted separately to find the most appropriate model in this study. We suggest RF as a machine learning method to aid in prediction and diagnosis for binary classification of T. marneffei infection status, which has a relative higher power of 0.877 compare to logistic regression models. Due to the reliance on classical forms of statistical analysis that restrict the number of predictor variables, logistic regression models primarily limited to baseline data and may have only moderate accuracy in T. marneffei risk prediction. RF analysis was used to build prediction models because this method is able to incorporate many predictor variables without compromising the accuracy of the risk prediction [27] . The sensitivity of the our models is comparable to the microbiological culture (78.2% ~ 83.2% vs 76.7%) [28] . Notably, the novel method with only clinical parameters was quite valuable, indicating that our model for T. marneffei infection prediction and early diagnosis is possible.
Our findings reveal that patients with typical skin lesions had a stronger association with the probability of T. marneffei infection in the univariate and multivariate analysis. Moreover, RF analysis also revealed skin lesions as most important indicator in the prediction models. Thus, the presence of skin lesions may associated with a more rapid to antifungal initiation and shorter mean duration of antifungal treatment [4] . Interestingly, previous studies have found that typical skin lesions were present in 71%-83% of patients infected with T. marneffei [4,17,28] , but this rate was only 40.8% in our study. This unexpected finding may be explained by different reasons. In our study, the median interval from the time of admission to initiation of antifungal therapy was only 3 days. Thus, early diagnosis and timely antifungal therapy for T. marneffei may favoured immunological recovery and lower incidence rate of skin lesions. On the other hand, patients with typical skin lesions may also diagnosed and treated in primary hospitals. In the absence of skin lesions, the differential diagnoses for T. marneffei among AIDS-associated febrile illness still challenging, epidemiological data and specimen examination are essential to make a diagnosis for patients without skin lesions [29] .
In addition, we observed that patients with increased AST and AARI levels also had higher risk of T.
marneffei infection. RF analysis revealed increased AST and AARI as important variables in the classification scheme (Fig. 2), which suggested that they are powerful indicators of T. marneffei infection. As important serum biomarkers for accessing liver injury, ALT and AST are separately sensitive to the damage of hepatocyte membrane and hepatocyte mitochondria [30,31] . In our study, the proportion of increased AST and ALT accounted for 75% and 63% of the T. marneffei-infected population, respectively, which were similar to the related studies [6] . Clinically, T. marneffei is characterized by fungal invasion of multiple body organ systems, as it can proliferates in macrophages and disseminates via the reticuloendotheial system [3] . Thus, liver is vulnerable to T. marneffei attack. Previous studies have also shown that liver biopsy of HIV/AIDS patients in making a diagnosis of hepatic T. marneffei and that there are different patterns of pathologic damage [32] .
Thus, the presence of increased AST and AARI levels may be favoring accuracy and early diagnosis of T. marneffei.
Similar to the results from others studies, T. marneffei-infected HIV/AIDS patients are characteristics with low CD4 + T cell count, POAL, hepatomegaly, splenomegaly, and weight loss in our study [4,6,15] . Among patients with lower (< 50 cells/mL) CD4 + T-cell counts, the body's immunity could not maintained at a certain level and clinical T. marneffei symptoms (e.g. POAL, hepatosplenomegaly and weight loss) may present earlier in the disease course, which may involve more rapid diagnosis and appropriate antifungal therapy. Therefore, in the absence of skin lesions, clinicians should also be alert to mixed infection with T. marneffei especially in endemic areas for the patients in a low CD4 + T cell count who present with symptoms of POAL, hepatosplenomegaly, and weight loss, and do not respond to the empirical or directed therapies.
Several limitations are to be noted including the lack of more diverse external cohort for further verification and the exclusion patients with T. marneffei latent infections, which may reduce the effectiveness of our model. More samples and studies are highly required based on our models, which may substitute method into a practical clinic protocol. Althought the diagnosis of T. marneffei still needs to be confirmed by fungal culture, the model could assess and find high-risk patients of T.
marneffei infection before later stages, guiding clinicians in early early diagnosis and proper treatment.

Conclusions
In conclusion, we have successfully developed and validated accurate prediction models for T. Declarations participant prior to their enrollment.

Availability of data and materials
The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

Consent for publication
Not applicable.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figure 1
Chart of the patients included in the study.

Figure 1
Chart of the patients included in the study.   Using ROC curves, we assessed the ability of 9 models to accurately predict T. marneffei infection according to different parameters obtained in the logistic regression. Model A-I was respectively constructed with weight loss, skin lesions, POAL, hepatomegaly, splenomegaly, lymphocyte count, AST level, AARI and CD4+ T-cell count. Using ROC curves, we assessed the ability of 9 models to accurately predict T. marneffei infection according to different parameters obtained in the logistic regression. Model A-I was respectively constructed with weight loss, skin lesions, POAL, hepatomegaly, splenomegaly, lymphocyte count, AST level, AARI and CD4+ T-cell count.