The Relationship between Chinese Outpatients’ Characteristics and Healthcare-Seeking Behavioral Decision-Making: Decision Tree and Linear Regression Analyses Based on Survey Data in Jiaxing


 Background Chinese patients generally experience difficulties and high costs when obtaining medical services. One consensus reason for these difficulties is imbalances in hospitals’ medical standards and resources, which cannot be changed in the short term. This article explores the most perplexing healthcare-seeking problems in China by considering the laws pertaining to how different patients may see a doctor. Methods Data mining and analysis can characterize a person and explore the heuristics underlying his or her decisions using a combination of comprehensive feature data. Accordingly, a questionnaire was designed that probed numerous variables of relevance to decision-making and we analyzed the survey data using a decision tree and linear regression. The decision tree facilitated observation of the healthcare-seeking decision-making routes of different patients. In addition, linear regression analysis revealed that patients tended to choose different hospitals with different features.Results This article primarily argues that in China, having medical insurance and a profession that represents economic strength and social class are the most important factors that guide the outpatient's patterns of seeing a doctor. Further, outpatients who live far from hospitals and recognize they have a serious disease would choose different hospitals than those on the contrary. Conclusion Using a decision tree and line regression, we drew the portraits and decision-making routes behind various outpatients’ characteristics when he or she saw a doctor. It provides direction about the type and location about future hospital construction in this area, and it addresses how to avoid overcrowding at large hospitals and how to provide enhanced possibilities for diversion to smaller hospitals.


Introduction
The current medical system in China is the product of the Chinese-type socialist market economic system [1]. To date, public and private hospitals have coexisted. Public hospitals are the main body in a national medical service system that is divided into three levels according to the hospital scale and medical level [1]. The tertiary hospitals which mean the highest level not only receive consistent policy attention and support through nancial subsidies [2], talented person treatment and loans, but also make full use of the reforms of the market economy system over the past 30 years. They have retained considerable pro ts through their independent pricing and the operation of the medical market through drugs and examinations, and have formed a monopoly advantage over the rst-and second-level hospitals [2]. For this reason, and due to signi cant differences in the level of diagnosis and treatment among hospitals, many patients choose to be treated in the large tertiary hospitals regardless of whether they have serious or minor disease [2]. As a result, large hospitals are often overcrowded, and Chinese patients increasingly experience di culties and high costs related to obtaining medical services. Previously, China's administration wanted to implement a hierarchical diagnosis and treatment system, but it was di cult to achieve because of the liberalization and marketization of patients' choices of medical treatment [3].
Although development space for rst-and second-level public hospitals and private hospitals is limited, patients are gradually being attracted to seek treatments there because these hospitals began employing well-known experts from the tertiary hospitals for consultation and operations, and improved the medical service environment in a competitive market. In addition, due to poor medical experiences, such as di culties and high costs associated with seeing a doctor, numerous patients in tertiary hospitals are also being diverted to visit rst and second-level hospitals [3].
What kinds of patients would choose to go to these hospitals instead of the tertiary hospitals? What are the factors that patients consider when choosing a hospital? These questions represent traditional classi cation and regression problems.
Through building classi cation and regression models of data obtained using a questionnaire we designed, this study explored the relationship between patients' characteristics and their healthcareseeking behavioral decision-making, and determined the patterns underlying different patients' decisionmaking mechanisms for seeking health care. This is tantamount to a portrait of the patient through consideration of comprehensive features. If these portraits are correct, the public medical administrators in a region can generally understand the rules by which various patients make medical decisions. This could facilitate the reasonable distribution of medical resources, and effective guidance and diversion of patients, thereby nally resolving to some extent the di culties and high costs associated with seeing a doctor in China.

Selection of patient characteristics and questionnaire design
In 1968, Andersen rst proposed the behavioral model of medical service utilization, which represented systematic research into patients' healthcare-seeking behavioral decision-making. The structural framework of the model is 1) environmental factors: primarily including the external environment, such as natural, political, economic, and medical systems; 2) population features: including predisposing and enabling factors (personal, family, and community resources) and need (affected by social factors and health beliefs); 3) healthy behaviors; and 4) health results [4]. Andersen also pointed out that among these factors, symptoms themselves are also a form of social construction. Patients' perceptions and interpretations of symptoms are affected by their social and cultural backgrounds [5]. In addition, some scholars believe that a relatively "disadvantaged group" is more inclined to self-diagnosis, choosing a hospital closer to their residence, or a private hospital [6]. Some scholars in China have indicated that patients tend to choose a high-level medical institution given sudden or severe illness, greater nancial resources make it possible for patients to actively seek health care services, and the improvement in cultural literacy facilitates patients paying more attention to their own health [7].
Prior research considered various factors that in uence the healthcare-seeking behaviors of the population, particularly social and cultural backgrounds as well as individual characteristics, such as symptoms and nancial resources. Based on the research paradigm of behavioral economics, Behavioral Decision Theory posits that the emergence of any decision is inseparable from three factors: the contextual factors underlying the decision, the features of the individual's beliefs, and the individual's preference structure [8,9]. According to the theoretical background of Behavioral Decision Theory, the American psychologist Daniel and the Israeli behavioral nance scientist Amos (1979) proposed a new theory of behavioral decisions, Prospect Theory, by introducing psychology to economics. Prospect Theory is based on the individual's actual state of decision-making; the theory focuses on the psychological reasons for the behavior. According to Prospect Theory, subjective systematic bias and individual risk preference directly affect the decision-making process and the patient's interpretation and value judgment of the expected goal, thus affecting their healthcare-seeking behavioral decision-making.
Subjective systematic bias varies with cognitive ability and disease severity. For example, when patients face uncertain prospects at decision-making nodes, the more serious they consider their illness, the greater their fear, and the greater extent their decision-making will be affected by their risk preference, resulting in less informed or objectively based healthcare-seeking behaviors [10]. In contrast, if the expected result is highly certain, patients will comprehensively consider factors including medical technology, economy, and convenience of medical treatment, and their healthcare-seeking behaviors will tend to be more reasonable [11].
According to the above theories and several important factors in uencing healthcare-seeking behavioral decision-making that have been generally recognized by scholars in recent years [12,13,14], we selected seven characteristics of patients: gender, age, profession, birthplace, location of residence, having or not having medical insurance (social-demographic feature), and the degree of self-recognition of the disease (serious, medium, or minor; a psychological feature). Among these, the personal economic factor is very important, but due to the need to respect the privacy of patients, profession was used as a proxy (farmer, industrial worker, staff or civil servant, freelancer or individual owner). Different professions in China represent different classes to some degree, and there is a correlation to some extent between profession and income [15].
In addition, according to the conclusions drawn from Equilibrium Theory and Demand Theory as proposed by Zhang et al. (2002), demographic features and economic factors are the main reasons for the need for regional medical services, which are clearly manifested in outpatients' healthcare-seeking behaviors [16]. Therefore, we selected outpatient hospital departments in Jiaxing, a representative medium-sized city in China. Jiaxing has a unique geographical character, as it is located near Shanghai and Hangzhou. These are two cities with hospitals with higher medical levels than those in Jiaxing. Jiaxing's patients can seek health treatments in Shanghai and Hangzhou relatively easily. The next step was to design the questionnaire. As mentioned earlier, Prospect Theory examines the in uences of psychology and cognitive biases on individual behavioral decision-making. Therefore, the questionnaire was designed for the outpatients, considering their emotions, preferences, and other conditions, consistent with the research paradigms of behavioral economics. The outpatients' behavioral decision-making could be studied qualitatively or quantitatively with reference to their socialdemographic and psychological features [17].
The questionnaire consisted of two parts: the rst part queried seven characteristics of the outpatients, while the other part addressed patients' healthcare-seeking behavioral decision-making, via three questions. First, the respondent was asked about his or her general patterns when seeing a doctor. Speci cally, the question asked whether having either a minor or a serious disease would lead them to attend a large hospital, or whether having a minor disease would lead them to attend a small hospital, but having a serious disease would lead them to attend a large hospital. Second, they were asked why they chose to visit the hospital. The question addressed eight factors, such as the level of medical expertise and proximity. The outpatients were also asked to rank these factors according to their perception of the importance. Third, they were asked whether they would choose to see a doctor in Shanghai or Hangzhou.
The outpatient departments we selected for the questionnaire were the most representative of Jiaxing's hospitals: two tertiary-level, two second-level, and one rst-level hospitals. The number of hospitals used in this survey at each level was based on the distribution of outpatients among the three hospital levels.
The questionnaires were distributed to the outpatients by random sampling. The outpatients were guided to complete the questionnaires according to their utilization of outpatient services. Among the outpatients we surveyed, only 2% refused to participate. A total of 195 valid questionnaires were obtained, 100 of which were from the third level hospital, while the other 95 were from the second-and rst-level hospitals. The survey process was approved and supervised by the Health Commission of Jiaxing.

Decision tree analysis
The rst issue to be addressed regarding outpatients' decision-making was that there were two different answers to general healthcare-seeking patterns, which belonged to different categories. For this research data, which consisted of features and category labels, was appropriately analyzed using the decision tree of machine learning. The decision tree is a non-parametric supervised machine learning method. It can summarize decision rules from a series of data with features and labels, and then present these rules using the structure of a tree chart to solve the problems of classi cation and regression. Its advantages are simplicity of explanation, visualization, and application, while one of its potential disadvantages is over-tting [18]. The C4.5 formula was selected for this analysis.
We used the open source software PyCharm to analyze the questionnaire data of the 195 outpatients according to the C4.5 formula [19], and obtained the information gain rates of the seven characteristics of the outpatients. The information gain rates for gender and degree of self-recognition of the disease were c. 0.1% and smaller; the contributions were very small due to cross-entropy loss. Additionally, from a common-sense perspective, the degree of self-recognition of a disease is unrelated to the rst question in the questionnaire. Therefore, considering simplicity, we removed these two features from the decision-tree model and reanalyzed it using the C4.5 algorithm. The initial prototype of the decision tree was obtained by selecting features from the largest to the smallest cross-entropy loss and growing the branch-leaf nodes from the top to the bottom. Because of the over-tting shortcoming of the decision tree, we decided to use the machine learning test data sets to prune the initial decision tree to reduce the risk of over-tting and improve the generalization ability of the model. The test data sets were randomly selected by PyCharm at a ratio of c. 1:8 from the 195 patients, using Cost Complexity Pruning (CCP, post-pruning) [20]. The proportion of errors for the test data sets of the initial decision tree before pruning was 23%. After repeated prunes, the error proportion after the 13th CCP prune was the smallest, at 20.5%; accordingly, we chose this as the nal decision-tree model. Finally, the model was drawn using Matplotlib of PyCharm, which is shown in Fig. 1.

Linear regression analysis
The second question to the outpatients regarding healthcare-seeking behavioral decision-making was why they chose to visit this hospital today; there were eight factors they needed to consider. Among these, four factors were regarded as indicators representing the "hard power" of the hospital's medical advancement, which were assigned 4, 3, 2, and 1 point respectively, according to the level of advancement. The four factors were as follows: A, the hospital or doctors in the hospital have a high level of medical expertise; G, the hospital has advanced diagnostic equipment; F, the hospital has available renowned experts from large hospitals for consultations; and E, doctors in this hospital demonstrate good attitudes and provide detailed explanations. The other four factors were regarded as indicators representing the "soft power" of the hospital, which were assigned − 4, -3, -2, and − 1 points respectively, according to the level. The four factors were as follows: B, the hospital is close to home; C, it is convenient to see a doctor in the hospital; D, the proportion of medical insurance reimbursement is higher in this hospital; and H, medical treatments are cheap in this hospital. In this case, a higher score indicated that the outpatient valued "hard power" to a greater extent, while lower scores indicated that the outpatient valued "soft power" to a greater extent.
In addition, the questionnaire asked outpatients to rank options based on their perceived importance. We took a weighted average of these option scores. The weighting formula used was is the weighted average score, W i is the weight value of the ith option, and M i is the score of the ith option.
Accordingly, we obtained the weighted average scores of the 195 outpatients. One question raised in our analysis is whether there were causal links between these 195 scores and the seven characteristics of the outpatients. Another question is whether characteristics could explain the target variable, namely scores. We addressed these questions using regression techniques.
First, we de ned the regression model is a continuous random variable, consisting of the weighted average scores of the outpatients; N is gender, O is age, P is profession, Q is birthplace, R is residency location, S is having or not having medical insurance, and T is the degree of self-recognition of disease, all of which are discrete random variables. The remaining letters represent parameters and i represents the order of outpatients.
The dummy variables were de ned using Stata software. The seven variables were all dummy variables, among which N, R, and S were initially assigned 0 or 1 and did not need further de nition. Dummy variables were de ned for O, P, and Q, respectively. T was an ordinal variable with the same span across all levels, which was therefore treated as a scalar continuous variable; thus, we de ned 0, 1, and 2, in turn as representing minor disease, moderate disease, and serious disease.
Next, to eliminate any adverse effect of multicollinearity on the multiple linear regression model, stepwise regression was adopted. First, Stata was used to perform regressions on M and each independent variable. M and P2 (industrial worker), M and Q1 (near Jiaxing), M and Q2 (distant), M and R, and M and T all passed the F-test (P < 0.05). Then, the stepwise regression command ("swreg") in Stata was used to perform stepwise regression on M and the ve variables that passed the F-test.
In the process, the two independent variables Q1 and Q2 were removed in the stepwise regression and the remaining three independent variables jointly passed the F-test (P < 0.0001), although the dummy_P2's ttest P-value was slightly larger (P = 0.080).
Finally, considering that the error terms of the 195 outpatients would exhibit heteroscedasticity (i.e., the features of the outpatients differed greatly, such as farmer versus worker), we used Stata to perform a White test on the above model. The P-value of the χ 2 (2) test was 0.0472, namely less than 0.1, thereby implying that heteroscedasticity was present. Accordingly, weighted least squares correction was executed using the wls0 command of Stata; thus, the nal result was obtained.
After the correction of heteroscedasticity, the adjusted R 2 value rose from 0.1970 to 0.2144, and the Pvalue of the F-test remained < 0.0001, thereby representing an improved model. However, the dummy_P2 failed the t-test (P = 0.209).

Decision tree analysis
As seen in Fig. 1, the category labels A and C of the decision tree are the answers to the rst question of the questionnaire about the general patterns of seeing a doctor: A occurred when people with both minor and serious diseases chose large hospitals; however, C occurred when people with minor diseases chose small hospitals and people with serious diseases chose large hospitals. (1) First, whether outpatients have medical insurance is the primary factor in the classi cation; those without medical insurance are more likely to choose C.
(2) Second, under the premise of having medical insurance, profession becomes the main factor in the classi cation. Profession types 2 and 3 were more inclined to choose A, while type 1 (industrial worker) was more inclined to choose C.
(3) Third, if the profession was farmer and the residence location was near Jiaxing, the person was more likely to choose C.

Linear regression analysis
The nal regression model was Outpatients who usually resided near Jiaxing valued hospitals' "hard power" when making healthcareseeking decisions to a greater extent than did outpatients who lived in Jiaxing city. Similarly, outpatients who recognized themselves as having serious diseases valued hospitals' "hard power" to a greater extent than did those with moderate or minor diseases.

Discussion
Most previouss studies were analyzed the relationships between characteristics of patients and healthcare-seeking behaviors for speci c diseases and some characteristics [22,23,24]. However, the current study focused on general characteristics of outpatients' behaviors which were closely related to the current healthcare-seeking problems in China. In addition, in this study, the data mining method was used to draw the portraits behind various outpatients' characteristics, it can be extended that during the current COVID-19 epidemic, some people do not wear masks and do not seek medical treatment in timely manner, while others seek hospital attention immediately; these are important behaviors underlying the spread of the infectious disease. Using data mining and analysis to characterize these different types of patient could help management decision-makers to take targeted actions to prevent further epidemic diffusion.
For this article, we will discuss it as follows: First, we used a decision tree to analyze whether people with both minor and serious diseases chose to access large hospitals or whether people with minor disease chose small hospitals, while those with serious disease chose large hospitals. Behavioral decision-making has most commonly been assessed using regression; decision-tree analysis has seldom been adopted, even though it can permit better visualization of decision-making routes. The nal decision tree we obtained results showed that having/not having medical insurance was the primary factor that guided the outpatients' decision regarding seeing a doctor, which indicates that the economic factor still was most important for outpatients in China. In addition, according to the 2017 Jiaxing city statistical yearbook data [25], the ratio of citizens with medical insurance to those without was 1.5:1 (216.52/140.45, unit: 10,000 people). However, this ratio was 3.9:1 (155/40, unit: 1 person) in our sample. This discrepancy between the ratios may indicate that more outpatients without medical insurance than with insurance would choose to not visit hospitals. This conclusion is consistent with the statistical information center of the health department in China: the basic medical insurance system for urban workers and the new rural cooperative medical system have signi cantly promoted the hospitalization rate of Chinese patients [15]. The result above shows the importance of economic factors, because not having medical insurance means that outpatients have to spend more money to see a doctor. This is consistent with the current need for hierarchical diagnosis and treatment in Chinese hospitals, and further suggests that changing the ratio of medical insurance reimbursement would increase the use of outpatient facilities, which might be represented as a reduced reimbursement ratio in large hospitals and an increased ratio in small hospitals.
Moreover, among persons with medical insurance, having a profession, which is representative of being within higher social classes in China, was the second most in uential factor that guided outpatients' patterns of seeing a doctor. For example, staff, civil servants, freelancers, and individual owners have greater advantages than farmers and industrial workers in terms of economic power and individual capabilities. Therefore, such persons will choose large hospitals with better diagnosis and treatment expertise, regardless of whether they have minor or serious disease. In contrast, farmers and industrial workers are more inclined to avoid seeking the highest standard of care. This conclusion is similar to those reported by international scholars: income is an important determinant of medical accessibility, especially in countries that lack universal medical insurance and predominately rely on private medical insurance [26]. Discounting the objective factors of individual income and capabilities, we arrive at the conclusion that the fundamental way to solve outpatients' overcrowding in large hospitals is to reduce current discrepancies among Chinese hospitals in the quality of diagnosis and treatment.
In addition, for farmers with medical insurance, diversion to medium or small hospitals is easier to achieve for outpatients living near Jiaxing. Long distances and inconvenience are factors outpatients consider when they seek medical treatments. This is consistent with results in America [27], Australia [28], and Turkey [29], in which research con rmed a negative relationship between distance and readmission rate, and a positive relationship between distance and length of hospital stay. This also suggests that medical regional administrators have the opportunity to strengthen outcomes by building rst-and second-level hospitals near cities to retain more patients and to achieve diversion to these hospitals.
Regarding the second question of the questionnaire, we used linear regression to explore the effects of outpatients' characteristics on healthcare-seeking behavioral decision-making. The results showed that outpatients who recognized they had serious diseases or who lived further away from healthcare facilities considered important the "hard power" that re ected the hospital's level of medical expertise. Otherwise, "soft power" was prioritized, which re ected the convenience and comfortable environment of hospital diagnosis and treatment. To some extent this is consistent with the Prospect Theory of patients' healthcare-seeking behavioral decision-making, which we analyzed earlier: when outpatients consider themselves to have a serious disease, decision-making is more affected by their risk preference and they are more inclined to consider the expertise of care facility. In contrast, outpatients who consider themselves as having a minor disease are more likely to consider other factors, such as economics and convenience of medical treatments, and their healthcare-seeking behaviors tend to be more comprehensive and rational. This suggests that managers of small hospitals could attract different outpatients from large hospitals, such as outpatients with minor or moderate disease, by enhancing the hospital's "soft power." Furthermore, outpatients who currently lived further away from the hospital were more concerned about the hospital's "hard power," indicating that their main preference was to cure disease, with less consideration of "soft power" factors. In contrast, outpatients living near the hospital paid less attention to "hard power;" "soft power" factors were their concern, given the same seriousness of diseases as outpatients who lived further away. That is, the distance one must travel to the hospital is also an important cost consideration for outpatients. This conclusion is consistent with Wu [30] and Jiang [31] in China: the further the distance to see a doctor, the greater the extent to which patients consider the medical expertise level of hospitals. Therefore, our suggestion is to promote the building of rst-and second-level hospitals near cities to retain more patients and to achieve diversion, which again accords with our previous judgment from the decision-tree analysis.
In the linear regression analysis, we also saw that profession affected the outpatients' decisions when seeking medical treatments. However, the effect did not reach signi cance, probably due to sample size.
In addition, the adjusted R 2 value was only 0.2144 of the model that used seven outpatient characteristics to explain the target variable, which indicates that there may remain important characteristics that we did not include.
Finally, the results of the third question regarding healthcare-seeking behavioral decision-making showed that most outpatients (98%) believed the medical expertise of larger hospitals in larger cities near Jiaxing (such as Shanghai and Hangzhou) was greater than the medical expertise available in hospitals in Jiaxing; the participants chose to visit the former hospitals, when the condition requires high-quality care, regardless of distance. This con rms our conclusion that in China, patients believe the medical level of representative hospitals varies with the large and medium cities, and hospitals in Jiaxing must improve their standard of medical care to meet the local outpatients' needs.

Declarations
Availability of data and materials The data that support the ndings of this study are available from the questionnaires we designed for the outpatients; the designed questionnaire and all data generated or analyzed during this study are included in the supplementary information les.
Ethics approval and consent to participate All methods in the study were carried out in accordance with relevant guidelines and regulations.
The whole investigation process for the study was approved by the Ethics Committee of Jiaxing Health Commission, which is the local administration agency, and all outpatients and family members who participated in the questionnaires were voluntarily. We obtained informed consent from all subjects or, from a parent and/or legal guardian if subjects are under 18.