Data
The study included 1.07 million health insurance swipe records from the S-City Health Insurance Agency for the period January to December 2018. These data include card number, age, gender, nature of household, type of settlement, diagnostic information (text information + ICD code), cost (total cost, reimbursed cost, out-of-pocket cost), medical institution visited and its level for all card swipers. Our study includes fully de-identified data and is therefore not considered a human subjects’ study; a formal institutional review board review is not necessary. We collected health insurance swipe records of S-city in 2018s by using Python version 3.8 to perform the analysis, excluding records with missing diagnostic information and duplicate swipes. Since our objective was to assess the non-parity of utilization of basic public health services, which was defined as the inability of peripheral primary care facilities to meet their needs turning to distant quality health care resources, patients whose first visit was at a level 1 hospital and who had a higher-level hospital visit were included in the study. In order to avoid the difference in the demand of different diseases on the service capacity of medical institutions, we selected diabetes as a chronic disease, and screened 144,447 records of diabetic patients according to the combination of ICD-10 (International Classification of Diseases, Revision 10) and diagnostic text information and distinguished early, middle, and late stage, and defined "abnormal glucose tolerance" as early stage of diabetes, 108,219 records as middle stage of diabetes, and 11226 records as late stage of diabetes with complications after "diabetes". We defined "abnormal glucose tolerance" as early stage of diabetes, with 25002 records, "diabetes" as middle stage of diabetes, with 108219 records, and "diabetes" followed by complications as late stage of diabetes, with 11226 records. We integrated 8761 patients with early diabetes, 14805 patients with middle diabetes, and 2129 patients with late diabetes according to their card numbers to observe the utilization of their medical resources. Since there is a rigid demand for primary and tertiary hospitals in the early and late stages respectively, and there is more room for choice between primary and quality medical services within the mid-stage patients, we finally targeted the study population of 12,266 patients with mid-stage diabetes whose first consultation was at a primary hospital to study their cross-regional referral behavior. The patient selection process is shown in Fig. 1.
Feature Engineering
We constructed virtual variables based on the original data to classify the characteristics in the patient visit record sample into supply-side factors and demand-side factors. The specific indicators included are shown in Table 1. The factors influencing patients' choice of medical care behavior can be summarized as demand-side factors such as socio-demographic characteristics, disease awareness and disease characteristics, economic factors, social networks, and health insurance; supply-side factors such as the price of medical services, the quality of medical services & the level of medical personnel, and the convenience of services, as well as the social factors such as population structure (aging), socio-demographic mobility, cultural attitudes, and the construction of medical service systems. Prediction is achieved by inputting a pre-constructed heterogeneous graph to the graph neural network model. The heterogeneous graph constructed in this study is a patient access network graph, and the features input to the graph neural network model are divided into two categories: node information (patient, hospital) and graph structure information (edges), and the features are input to patient nodes, hospital nodes, and edges connected by patient-hospital, respectively. By aggregating node information and graph structure information, a node representation is calculated for each node (patient and hospital) as input to the graph neural network link prediction model.
Supply-side factors. The supply of medical resources mainly includes accessibility, service quality, and burden cost. For patient node characteristics information, the approximate coordinate points of the primary care hospitals most frequently visited by patients are determined, and the distance of the nearest general hospital (km) is calculated after comparing with the distance of each general hospital to indicate the accessibility of quality medical resources; the number of times patients receive diabetes patients at the nearest primary care hospital is used to indicate the service quality of the nearby primary care hospitals; the average cost per patient is used to indicate their burden The average cost per patient indicates the cost of the patient. And for hospital node features, we will add the geographic coordinates and grade information of the hospital.
Demand-side factors. Factors influencing patients' choice of healthcare seeking behavior on the demand side are influenced by patient socio-demographic characteristics, disease awareness & disease characteristics, economic factors, and health insurance. These characteristics will be used as information input for patient nodes, and we will retain patient sociodemographic characteristics such as gender, age, and household registration, measure health status using the number of categories of ICD-10 disease codes for patients in 16–18 years, measure patient health importance using their total frequency of visits, and the percentage of general hospital visits to measure patient preferences for visits such as financial situation.
Historical utilization behavior. We will also use the learning of patients' past visit behavior choices as a basis for predicting the occurrence of their escalation referral behavior. This is mainly achieved by inputting graph structure information (connection relationships and edge weights). The graph structure information includes both the connection relationships of the nodes in the graph and the visit relationship (edge) characteristics, with the specific variables being the type of settlement at the time of visit, total amount of cost, coordinated payment, account payment, out-of-pocket payment, out-of-pocket payment, major medical expense, provider level, and distance between the patient and the hospital.
Table 1
Basic public health service utilization and its influencing factors
variable | Explanation |
Explained variables | |
Patient Ambulatory Care Behavior (Y) | Is it an escalation referral (yes = 1, no = 0) |
Core explanatory variables | |
Demand-side factors | |
Sociodemographic characteristics(society) | Age, household registration (1 = urban, 0 = rural), gender |
Patient Health Foundation(health) | Degree of disease (early, middle, late), co-morbidity (number of species) |
Health emphasis(psychology) | Frequency of visits |
Supply-side factors | |
Price of medical services (price) | Primary hospital sub-average cost |
Quality of medical services and level of medical staff (support) | Number of diabetic patients seen in primary care hospitals |
Health care accessibility (geography) | Distance to the nearest tertiary hospital (km) |
Model Construction
The service capacity of primary hospitals and the spatial accessibility of general hospitals are the main aspects affecting the effect of supply mix equalization, so we adopt two optimization approaches in different regions, namely improving the service capacity of primary hospitals and resetting the geographic location of general hospitals, to form a new healthcare resource supply mix as a dataset input to the graph neural network model. To be specific, for the first pathway, primary hospitals with below-average volumes were raised to average levels, and the likelihood of patients living near these primary hospitals going to tertiary hospitals for referral was predicted; For the second pathway, anchor the central location of a long-distance referral patient, use the geographic coordinates of that center as the location of the new tertiary hospital, and filter the side of the referral patient whose distance from the new tertiary hospital is lower than that of the original referral hospital, so that it is connected to the new tertiary hospital (Fig. 2). Our prediction set is a prediction of the likelihood of a patient going to the original referral hospital after changing the node or edge information. The result is the predicted probability score of the occurrence of the cross-regional visit, which is expressed as a probability number from 0 to 1. The following figure shows the path diagrams for the two prediction tasks.
We use a 7:2:1 ratio to randomly assign the datasets to the training, validation, and test sets. The model was developed using the training set. The validation set is used to adjust the model hyperparameters (i.e., model configuration parameters) and to have an initial assessment of the model's capabilities. The difference between the predicted and true values is calculated by the cross-entropy loss function, and we consider the model to have been trained to an acceptable level when the prediction error is at a minimum. The trained link prediction model will be used to predict the dataset under two optimized approaches and finally output the probability that all patients will visit a specific hospital.
We selected five graph neural network models for comparative analysis, including the current classical model of graph neural networks, the relational graph convolutional network (RGCN)[], the classical model of heterogeneous graph neural networks, HetGNN, which formally defines for the first time a heterogeneous graph representation learning problem considering both structural heterogeneity of the graph and node content heterogeneity[], and two of the latest 2020 heterogeneous graph neural network models, HGT[] and CompGCN[], and we selected the model with the best performance based on evaluation metrics. Our model is implemented in Python via the DGL-based openhegnn toolkit.
ResEthics declarations
Since only de-identified data were used, the Tel-Aviv University review board (IRB) waived off the requirement of informed consent by the patients. Therefore, the IRB determined that the Chinese Municipal public administrations of Health public dataset used in this study is exempted from an approval.
Patient Cohort Description
For all diabetic patients, primary hospitals accepted 91.5% of early diabetic patients, 88% of middle diabetic patients and 43.9% of late diabetic patients, and tertiary hospitals mainly saw 21.8% of middle diabetic patients and 56.3% of late diabetic patients, the city's doctor-patient interface fully reflects the principle of hierarchical diagnosis and treatment, and It can be said that the structure of the supply system of medical and health resources and the order of access to medical care in city S are relatively reasonable. However, we also discovered that access to quality healthcare resources is not yet universal, and there is a large disparity in the accessibility of quality resources for patients living on the urban fringe. About 8% of diabetic patients are ambulatory, which is a small percentage of the total number of ambulatory patients, however 13% of the total number of intermediate diabetic patients are ambulatory, 10% of the late diabetic patients are ambulatory based on the rigid demand of the disease, and the intermediate diabetic patients are ambulatory most likely reflects the failure of the hospital service supply to meet their needs. Because we focused more on patient escalation visit behavior, we limited our analysis to 12,276 interim patients who had their first visit to a primary hospital in 2018.
For the final analyzed cohort of patients who had their first visit at the primary level in the medium term (n = 12276), a total of 91159 health insurance swipe records existed. Approximately 91.37% of these patients (n = 11208) were seen only at the primary level (i.e., no referral to a higher level hospital), and about 8.63% of patients who were seen at the primary level were also referred to a higher level hospital. Table 2 presents the basic characteristics of referred and non-referred patients, and the results show significant differences in the demand-side factors of personal socioeconomic characteristics and medical resource supply factors: referred patients were mainly from district y (89.8%) and district s (8.7%), with a larger proportion of rural household registration, poorer health base, higher health concern, closer to tertiary hospitals, and slightly poorer level of primary hospitals. We discovered that among these supply and demand factors, the demand-side factors of household registration and area of residence and the supply-side factors of accessibility to quality resources and capacity of primary care institutions all involve the allocation of healthcare resources in geographic space, in addition to the factor of patients' disease perceptions, which provides a preliminary basis for prediction of graph neural networks by optimizing health care resource allocation.
Table 2
Comparison Table of Referred and Non-Referred Patient Statistics
| Referred patients | Non-referred patients |
Household Registration | |
Rural | 594(56%) | 5180(46.3%) |
City | 464(43.8%) | 6028(53.7%) |
Cost of primary care visits | | |
Average | 70.85 | 71.00 |
Patient's region | | |
District of Y | 951(89.8%) | 9912(88.4%) |
District of S | 93(8.7%) | 1198(10.6%) |
District of K | 14(1.3%) | 98(0.8%) |
Health Foundation | | |
1 | 54(5.1%) | 2490(22.2%) |
2 | 233(22.0%) | 3669(32.7%) |
3 | 297(28%) | 2644(23.5%) |
4 | 474(44.8%) | 2405(21.4%) |
Health concerns (number of visits) | | |
Average | 11.44 | 7.05 |
Median | 6 | 6 |
Accessibility of quality medical resources (nearest tertiary hospital) | | |
Average | 5.19 | 5.50 |
Median | 3.15 | 4.76 |
Capacity of primary care institutions (treatment volume) | | |
Average | 928.27 | 955.23 |
Median | 1025 | 1025 |
Based on the above statistical analysis, we present the distribution of all referred patients with Different grades of hospitals on the satellite base map (shown in Fig. 3). As of 2018, district y, district k and district s under city s are the three major municipal districts, of which district y is the seat of Municipal Committee,Municipal People's Government and the political, economic and cultural center of the city of S. The residential areas in the main city of S are located in district Y, northeast of district K and northwest of district S. Combining the area and population density, the primary hospitals are more evenly distributed near the residential areas, while the tertiary hospitals are mainly concentrated in district Y. The only tertiary hospitals in district K and district S are located in the center of the more densely populated districts. Therefore, we believe that the distribution of basic medical resources in city s is relatively balanced, but there are large regional differences in the distribution of high-quality medical resources.
Note
According to the results of China’s seventh census, the resident population of Y District is 1.02 million, the area under its jurisdiction is 498km², and the population density is about 2048 people/km²; the resident population of K District is 1.0989 million, the area under its jurisdiction is 1041km², and the population density is about 1047 people/km²; the resident population of S District is 839,700, the area under its jurisdiction is 1403km², and the population density is about 592 people/km². The population density is about 592 persons/km².
In order to explore the relationship between regional healthcare resource allocation and patient referrals further, we reflected the access trajectories of referred patients in geographic space (Fig. 4) and discovered that patient escalation referrals were concentrated in two quality resource concentrations, one distributing tertiary hospitals in district y, which received some patients from remote areas in the southern part of the district, district k and district s, and the other tertiary hospitals in district s, which received mainly referred patients from the district. Patients in region y have escalating referrals that crowd out quality medical resources, while patients in regions k and s face the challenge of accessing care across regions. We believe that the unequal allocation of medical resources in city s not only suppresses the space for patients in district y to use primary hospitals, but also causes difficulties for patients in districts k and s to access quality resources, which must be solved by improving the capacity of primary medical services and sinking quality resources, respectively.
The distribution of patients received by tertiary hospitals is shown specifically in Table 3, which more clearly reflects the flow of referrals. s city referral patients mainly flow to the district's high-quality medical resource centers: S city people's hospital, S city affiliated hospital of arts and science college, S city Chinese medicine hospital and gaojiang hospital co. half of s district referral patients flow to S city people's hospital in Y district and half flow to s district people's hospital in this district. Region K is close to region s and dispersed to each tertiary hospital in region S. This provides an empirical basis for the subsequent study of the implementation plan of supply optimization, i.e., increasing the supply of high-quality resources in district S and improving the service capacity of primary care institutions in district Y according to each region, respectively.
Table 3
Hospital flow statistics of referred patients
Hospital Name Patient's region | K distict | S distict | Y distict | Total |
S City People's Hospital | 4 | 41 | 523 | 568 |
S City College of Arts and Science Affiliated Hospital | 1 | 3 | 185 | 189 |
S City Chinese Medicine Hospital | 2 | 3 | 154 | 159 |
gaojiang Hospital | 0 | 0 | 50 | 50 |
S District People's Hospital | 0 | 40 | 2 | 42 |
S City No. 2 Hospital | 1 | 0 | 26 | 27 |
S City Centre Hospital | 3 | 0 | 22 | 25 |
S City No. 5 Hospital | 1 | 0 | 19 | 20 |
S City East Hospital | 1 | 0 | 8 | 9 |
Yue Du Hospital, District Y | 0 | 0 | 9 | 9 |
S City No. 7 Hospital | 0 | 0 | 6 | 6 |
No. 2 People's Hospital, District S | 0 | 5 | 0 | 5 |
Rehabilitation Hospital | 0 | 0 | 3 | 3 |
Chinese Medicine Hospital,District K | 1 | 0 | 2 | 3 |
Chinese Medicine Hospital,District S | 0 | 3 | 0 | 3 |
The prediction of Graph neural network
Among the four candidate modeling approaches evaluated(Fig. 5), all three models, HetGNN, CompGCN, and HGT, showed the best prediction performance, but HetGNN already achieved the best accuracy and loss values on the validation set through only 50 iterations and was therefore considered in the subsequent analysis. The highest AUC-ROC obtained by these three models for the prediction task in tertiary hospitals were 97.45%, 98.48%, and 98.43%, respectively.
Geo-visual representation of prediction results
Increase in tertiary hospitals
The distribution of all referral patients and tertiary hospitals in this study is shown in Fig. 6, and we decided to use the center point of the geographical location of the referral patients in area s as the location of additional tertiary hospitals (red dots) to predict the degree of attenuation of their cross-regional escalation referral behavior.
The predicted results of the four models are shown in Fig. 7, with the graphs presenting the likelihood of patients being seen at the additional tertiary hospital versus the original referral tertiary hospital, with the colors ranging from light to dark (yellow - purple) representing the distribution of the probability of being seen from low to high. Comparing the four models, it can be seen that, first, the predicted results of all four models have a lower probability of long-distance mobility visit behavior of patients from the southern part of region y and region s to the central tertiary hospital in region y (yellow lines are obvious). Second, the results of all three models, RGCN, HetGNN, and COMPGCN, showed that the probability of s-region referral patients being referred in the region's tertiary hospitals remained high, indicating that the addition of new tertiary hospitals resulted in a more significant reduction in cross-region visits compared to intra-region referrals; while the predicted results of HGT showed that both cross-region mobile visits and intra-region referrals were substantially reduced.
We assume that edges with a prediction probability of more than 50% exist and are shown on the graph(Fig. 8), and the flow of visits before and after the addition of a tertiary hospital is compared, and the prediction effect of each model is plotted below. The results indicate that the cross-regional referral behavior of patients from area s to the tertiary hospital in area y was significantly reduced before and after the addition of the tertiary hospital, and patients originally referred to the tertiary hospital in area s were also significantly diverted to the new tertiary hospital.
Improving the capacity of primary hospitals
In addition to cross-regional consultation behavior, there is also a large number of intra-regional referrals, which may be related to differences in the distribution of medical resources at various levels within the health care delivery system. Therefore, we improve the capacity of primary hospitals in y areas with below-average number of attendances to predict changes in patients' escalation visit behavior. The primary hospitals that need to improve their catchment capacity are shown in Fig. 9. The pink dots in the figure are the primary hospitals that plan to improve their catchment level, the yellow dots are the tertiary hospitals that refer patients to, and the sides are the referral patient access trajectories.
The predicted results of the four models are shown in Fig. 10, with the graph presenting the probability of patients in the vicinity of low-level primary hospitals versus still choosing to upgrade their visits after increasing their capacity, with the colors ranging from light to dark (yellow - purple) representing the probability of visits from low to high. The figure presents that all four models reduce, to varying degrees, the referral behavior of remote areas in region y to the center of quality medical resources (yellow and orange lines), however the predicted results of the RGCN, HGT, and COMPGCN models indicate that the medical center in region y, the People's Hospital of city s, still attracts patient referrals from various remote areas (purple and dark red lines).
Again the edges with predicted probability over 50% are present and displayed on the graph(Fig. 11), and the flow of visits before and after increasing the capacity of primary hospitals are compared, and the predicted effects of each model are plotted below. The results presented that intra-regional long-distance referral behavior of patients in the periphery of the y-zone was significantly reduced before and after improving the capacity of primary care institutions, but the siphoning effect of regional medical centers was still obvious.
Finally, we conclude that for areas where quality medical resources are scarce, quality resources should be more focus on primary level, and for areas with sufficient quality resources in the region, the focus should be on improving the capacity of primary care institutions to avoid overcrowding of quality medical resources and resulting in wasted medical resources. At the same time, we must acknowledge that the authoritative status of an inner-city medical center will attract a higher number of patient referrals, regardless of the adequacy of the surrounding medical resource supply. Therefore, to achieve the equalization of regional basic medical services, the key also lies in improving the radiation range of regional medical centers and realizing the universalization of high-quality medical resources through measures such as medical associations and branch hospitals.