Decision Tree Analysis on the Factors Influencing the COI of Infantile Bronchopneumonia Inpatients in a Northern Hospital of China

Background: Bronchopneumonia is a disease with a high death rate for children in developing countries. It not only affects the healthy growth of children, but also puts economic pressure and additional burdens on their families and society. Social development and advancement may change the factors influencing the hospitalization costs. This study aimed to explore reasonable cost control approaches by analyzing the factors related to unreasonable increases in medical costs, so as to standardize the diagnosis and treatment behaviors and determine management methods for scientific medical costs. Methods: Using the decision tree analysis method combined with the characteristic variables of inpatients, data mining and analysis were performed on the assortment of charges for15,980infantile bronchopneumonia inpatients in a northern hospital of China during January 2013 to December 2017. Results: The medical costs of infantile bronchopneumonia inpatients tended to decrease year by year. Various factors influencing the hospitalization costs were sequenced in order of decreasing importance: salvage, complications, admission condition, discharge condition, hospital stay, age and medical payment mode. The hospitalization cost of 623(78.5%) patients with salvage during hospitalization was >RMB 10,000. Hospitalization cost of <RMB 10,000 was seen in 14,688(96.2%) patients without salvage,13,921(87.1%) patients without complications(regardless of salvage),13,860(86.7%)patients without salvage and complications, and 948(81.6%) patients without salvage and with complications and a general admission condition. Conclusion The cost of illness of infantile bronchopneumonia tended to decrease over the five years, and salvage wasthe most important influential factor for hospitalization costs, followed by complications.


Abstract
Background: Bronchopneumonia is a disease with a high death rate for children in developing countries. It not only affects the healthy growth of children, but also puts economic pressure and additional burdens on their families and society. Social development and advancement may change the factors influencing the hospitalization costs. This study aimed to explore reasonable cost control approaches by analyzing the factors related to unreasonable increases in medical costs, so as to standardize the diagnosis and treatment behaviors and determine management methods for scientific medical costs. Methods: Using the decision tree analysis method combined with the characteristic variables of inpatients, data mining and analysis were performed on the assortment of charges for15,980infantile bronchopneumonia inpatients in a northern hospital of China during January 2013 to December 2017. Results: The medical costs of infantile bronchopneumonia inpatients tended to decrease year by year. Various factors influencing the hospitalization costs were sequenced in order of decreasing importance: salvage, complications, admission condition, discharge condition, hospital stay, age and medical payment mode. The hospitalization cost of 623(78.5%) patients with salvage during hospitalization was >RMB 10,000. Hospitalization cost of <RMB 10,000 was seen in 14,688(96.2%) patients without salvage,13,921(87.1%) patients without complications(regardless of salvage),13,860(86.7%)patients without salvage and complications, and 948(81.6%) patients without salvage and with complications and a general admission condition. Conclusion The cost of illness of infantile bronchopneumonia tended to decrease over the five years, and salvage wasthe most important influential factor for hospitalization costs, followed by complications.

Background
With the development of economy and science and technology, and an increased percentage of medical and health costs in GDP, governments and scholars in various countries have assigned increased importance to research on the cost of illness (COI) so as to achieve the purpose of reasonably deploying the medical and health resources and scientifically controlling the medical costs. Infantile bronchopneumonia is one of the most common lower respiratory tract infectious diseases. Because children have a weak immune system, the inflammation can develop rapidly and invade other tissues and organs, seriously reducing their quality of life and even threatening life, which not only affects their healthy growth, but also brings economic pressure and an additional burden to their families and society. Many international studies 1-8 have suggested that viruses are the first pathogens inducing bronchopneumonia, with Haemophilus influenzae the major bacterial pathogen of acute pneumonia and Streptococcus pneumonia also adominant pathogenic bacterium causing infantile bacterial pneumonia in developed and developing countries. Over 5,000,000 children in developing countries die of S.pneumoniae-induced pneumonia annually, with81%of death from pneumonia occurring in children ≤2 years old. As their age increases, children have stronger immunity because of increasingly mature body development and can recover quickly from diseases. hospitalization costs. Therefore, large-size data mining and investigational analysis are needed to reflect the current change trend of COI concerning infantile bronchopneumonia, and it is also necessary to explore reasonable cost control approaches by analyzing the factors related to the unreasonable increases in COI, so as to standardize the diagnosis and treatment behaviors and determine methods to manage scientific medical costs.

Data source
All cases of bronchopneumoniain children of 1-14 years old who were admitted to a northern hospital of China from January 2013 to December 2017were included in the study. The following data were collected: demographic information (age, sex and medical insurance mode) and diagnosis and treatment information (hospitalization date, hospital stay, hospitalization costs, admission condition, discharge condition, complications and salvage). The patients with the same identity information were regarded as repeat inpatients, and data for their first hospitalization were collected.
The medical payment mode was classified into the following types: medical insurance for urban residents, new-type rural cooperative medical insurance, business insurance and out-of-pocket payment.
The hospital stay was divided into two types:≤7 and >7 days.
Based on the medical situations, the admission condition was divided into three levels: critical, emergent and general.
Based on the turnover, discharge condition was classified into five types: improved, cured, uncured, death and other.

Analysis methods
The decision tree in machine learning is a learning-supervising method that is used to judge the possible event probability by calculating the probability of known events and then constructing a classification tree, and it is widely applied in classification and regression tasks and has such advantages as intuitiveness, high efficiency and measurability. The datasets are classified using the decision tree classifier J48 in WEKA3.8.1(Waikato Environment for Knowledge Analysis) tool, and then a classification tree based on the optimal information attributes is constructed with the same method [10]. The main purpose of the decision tree is to make a prediction model of target variables according to the predictable factors, and the decision tree is a supervision method to classify the samples into the categories of interest using the "if-then" rules [11]. This algorithm is used to find the most important independent variable and is first set at the root node and then introduces the next optimal fitting variable (known as bifurcation). The tree flows in a top-tobottom manner, from the root node to internal nodes (independent variables) and then to terminal leaf nodes (class prediction) [12][13][14][15][16][17]. In the decision tree, the first variable (root) is the most important factor, and the variable furthest from the root is the next most important factor for the data classification [18]. All variables in a path are regarded as predictors ("if" part) and the class labels of leaf nodes are the expected results ("then" part). To avoid over-fitting and maintain parsimony, the unnecessary terminal branches can be removed according to the defined algorithm so as to prune the tree-generated model, without influencing classification accuracy [19].   [9], suggesting that the medical costs of infantile bronchopneumonia inpatients had a yearly decreasing trend, although there was an improvement in standard of living, an increase incommodity prices and changes in some social and economic factors.

General data
We created a database of 15,980infantile bronchopneumonia inpatients who received treatment in a northern hospital of China from January 2013 to December 2017. The database consisted of 10 variables: eight input and two target variables. The input variables were age, sex, medical payment mode, hospital stay, salvage, complications, admission condition and discharge condition. The two target variables were hospitalization costs of>RMB 10,000 and <RMB 10,000. The classification characteristics were compared between the two target groups using x 2 test in SPSS 19.0 statistical software, and the analysis showed significant differences ( Table 2 ).
A decision tree with 13 nodes and 15 leaf nodes was constructed from the datasets of infantile bronchopneumonia inpatients using the 10-fold computation of J48 algorithm (Fig. 1). The decision tree showed various factors influencing the hospitalization costs in order of decreasing importance: salvage, complications, admission condition, discharge condition, hospital stay, age and medical payment mode. Table 3 lists all 13 "if-then" rules used for modeling. The hospitalization cost of 623(78.5%)bronchopneumonia patients with salvage during hospitalization was >RMB 10,000.
The performance of the decision tree for the hospitalization costs of children with bronchopneumonia was determined using a confusion matrix. The analysis accuracy of the decision tree was further evaluated by confusion matrix analysis of datasets ( Table 4). The correct classification rate, wrong classification rate, precision and recall of the prediction model were 78.6% (12,559/1598), 21.4% (3421/15,980), 78.1% and 78.6%, respectively; the precision and recall both had significant equilibrium.

Discussion
The hospitalization costs can be used as an important evaluation indicator for standard medical behaviors, and excessive hospitalization costs will bring a heavy economic burden to patient families and cause resource waste of medical insurance payments. It is necessary to analyze the structure of component charges in the hospitalization costs or the hospitalization cost, which can provide a foundation for the health economic research of diseases and a reference basis for the health management departments formulating the relevant policies of COI. The demonstration of cost study results and data allows hospital managers to understand the features of COI, and offers a data support to standardizing reasonable clinical treatment, managing the diagnosis and treatment behaviors of medical staff, and achieving the access of patients to the best medical services.
In the present study, we described the change trend and influencing factors of hospitalization costs by analyzing the COI and the influencing factors for infantile bronchopneumonia inpatients from a northern hospital of China during 2013-2017. In a view of the overall COI trend, the hospitalization costs of infantile bronchopneumonia inpatients tended to decrease yearly during 2013-2017. There was no increase in either hospitalization costs of disease burden due to the regulation of policies, although the living standards of people improved gradually and material life was enriched.
Salvage and complications had a greater impact on hospitalization costs and were the top two influential factors. Thus disease of patients having developed to an emergent state at admission and so requiring salvage, or having complications, should be avoided, because these can increase the expense burden.
Analysis showed that the following measures can be used to reasonably reduce the hospitalization costs of infantile bronchopneumonia: (i) actively carry out health education of patients and their families and popularize disease prevention and treatment knowledge to avoid serious disease states before admission; (ii) hospital managers should urge medical staff to avoid unnecessary examinations and charges and shorten the average hospital stay by finishing various necessary diagnostic examinations of patients in outpatient departments and by not requiring repeated examinations after hospitalization; (iii) provide timely examination results of patients by perfecting the information system so that clinicians can initially access these results by computer, without waiting for paper reports; and (iv) improve the business ability and technical level of doctors and determine management methods for scientific medical costs.
However, this study only involved analysis of a 5-year inpatient database and analysis of the factors affecting hospitalization costs was limited. Greater time range and coverage of the study would allow more rigorous and all-round analysis results.

Conclusions
We used a decision tree to analysis the factors influencing the cost of illness of infantile bronchopneumonia from a database of inpatients in a large northern Chinese hospital. We used seven features of patients as predictor variables. The classifier, trained by J48, indicates that factors influencing the hospitalization costs were sequenced in order of decreasing importance: salvage, complications, admission condition, discharge condition, hospital stay, age and medical payment mode. The demonstration of study results allows hospital managers to understand the factors influencing the COI of infantile bronchopneumonia inpatients. Such study can provide a foundation for the health economic research.

Consent for publication
Not applicable

Availability of data and materials
The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
All authors declare that they have no conflicts of interests.

Authors' contributions
QG and DP had the initial conception of the idea and background for this study. DP and YY made contributions in the acquisition and analysis of the data. JG preformed the literature search. All authors contributed to the writing, reviewing and final approval of the manuscript.   Table 3 Thirteen rules for hospitalization costs of infantile bronchopneumonia inpatients extracted from the decision tree  Figure 1 Dataset decision tree constructed with J48 algorithm (the digit in each node bracket represents the sample size)