Development of Distance Cost Estimation Model for Health-Base on the Public ocial Information

Background: Health services accessibility indicators with high reliability, validity, high timeliness and easy policy application can help to understand the current situation of medical resource supply and demand in a region and assist the government to allocate resources more effectively. However, in terms of the development of related indicators, it is difficult to protect the privacy of the residence of medical users from large databases or medical records, and to obtain the transportation cost of the actual use of medical and health services. The purpose of this study was to develop a distance cost index based on the national public information about the disease prevalence rate and the population in the region to estimate the distance cost of actual service users. This index could take into account the privacy of the patient's residence and solve the limitation of using the national health insurance database in the past. Methods: This study was a cross - sectional study and used secondary data analysis by SPSS and QGIS. It was mainly divided into the Verification group and the Index development group to calculate the medical treatment distance. Their data source came from Medical Center actual records of patients with diabetes and high blood pressure during 2017 - 2019 as Verification group and prevalence from National Public Information as Index development group. Finally, the consistency of the two groups' medical treatment distance is compared to verify the accuracy of the index development group. Results: The estimated distances of the Index development group are high consistency (ICC>0.9) with the Verification group and after adjusting age and gender, there are also excellent R - square (98.1%, 92.7%). The disease cost for health care formula is developed by the present study with the prevalence and population from Public Information as an easy policy application and it can protect the privacy of patients in the further.


Introduction
Since its implementation in 1995, the national Health Insurance has not only reduced the financial barrier of people seeking medical care, but also improved the situation of poverty due to illness and solved the economic barriers to access to health care. However, many studies have found that even after the implementation of the national health insurance, the problem of inadequate or unequal medical resources is widespread in remote areas of Taiwan [1][2][3] . In order to properly allocate medical resources, our government has gradually improved the uneven distribution and quality of medical care through the regionalized health care concept since 1985, but it still cannot resolve the doubts of justice arising from insufficient and uneven medical care resources in the rural areas and the long distance between doctors.
The most can be applied and discussed indicators of medical care accessible by health policy decision-makers are the number of specialist medical personnel, the number of hospital beds, the ratio of medical services to the population of the region et al. which are without geographical information [4][5][6][7][8][9] . The other accessible indicators are the distribution density of medical resources the distance and travel time for residents to seek medical care, and the specific individual medical care needs which need more geographic information, and private information as parametric to inferential. These are more important issue of justice and equity in resource allocation [3,[10][11].
Some studies pointed out that the utilization and demand of the medical services effected by the distance, the longer the distance between consumers' and medical providers or much time in medical care using, the less the medical utilization and demand will be [12][13] . Therefore, accurate measurement and appropriate configuration of regional medical care services are necessary for the effective improvement of medical services.
In order to accurately obtain the actual medical distance of the people, it is necessary to have the patient's residence and the terrain or transportation route from the actual residence to the medical institution. If it is necessary to more accurately estimate the medical accessibility, it is often necessary to include all the medical resources in the region, whether the medical treatment time is in the rush hour and other factors.

Wang & Luo, McGrail & Humphreys, Luo &Qi and Kilic The indicators of 2SFCA and
E2SFCA developed from 2005 to 2016, in addition to spatial information, also required personal data such as age, gender, race, income, occupation, and urban development information of the patient's residence to be calculated during the development process, which was difficult to use in policy [14][15][16][17]. Although the new indicators had developed by Yen & Lin in 2015 and 2019 which had greatly simplified, the study still uses the distance between the residential data of the case and the healthcare provider. From the above we can see that the important variable in estimating the distance cost is "where the patient lives" which address more specific be, the estimating will be more exact [3,18] .
The respect and importance that developed democracies place on human rights, personal privacy and other information has been shown in many laws. Taiwan's national health insurance database, with a 99% coverage rate, contains data on medical records and medical conditions, and thus becomes an important statistic for the government to estimate the people's health care services utilization and needs.
However, in order to implement human rights and respect individual privacy [19] . The Personal Data Protection Act regardless of public sector or private civil institutions were the scope of application, in the national health insurance database lacks not only the social factors to the patient data, but also retain only on medical address public insurance area (in the larger regional level), did not reveal people living in the neighborhood, and the actual address. In the past, the insurance documents of Taiwan's National Health Insurance database were used to estimate the medical distance only to the town's administrative level, so the distance to hospitals was mostly calculated from the population centers of towns. In Hualian County, for example, there are only 13 estimated distances between users and one medical caregiver because there are 13 towns in Hualian. This is over-simplified and biased information when estimating the distance cost of medical care, especially in remote areas where residents are highly dispersed.
In order to more accurately estimate the distance cost for health care of the population, it is necessary to develop a method that can estimate the distance between the user and medical care providers that doesn't need to use their address and more validity than using the administrative unit "town". Since the population of each township is not uniform, always with cluster or random distribution, it is more accurate to use the population weighting center to calculate the travel distance if the number of population is combined with the geographical center [20] , especially in the region remote area and the population density without uniform. We also hope this new method in the present study will protect patient privacy and address the limitations of various databases in which person location variables only reach townships level.
Therefore, we choose diabetes and hypertension as our targets which were the high prevalence of chronic diseases in eastern Taiwan. Diabetes is the fifth leading cause of death in eastern Taiwan in 2019 and because of diabetes and hypertension are with higher complications, cardiovascular disease, stroke, skin ulceration, retinopathy, neuropathy, kidney failure and amputation et al., and about half of the patients will die from heart disease and stroke [21] . We according to official data of prevalence and the area population estimate the number of cases and their medical distance as our group of index development and use the real outpatients' data from one medical center to calculate the actual medical distance to compare and looking forward to the new index can use to apply further survey with high precision distance cost index.

Materials and Methods
This study was a cross-sectional study. The study framework is in Fig 1, there were two groups in our study and we would compare their medical distance between two groups. In order to estimate the cost of medical distance in each community from the currently available data, diabetes

1.
Diabetes is defined as E08-E13 in the first three codes of ICD_10_CM of any primary diagnosis in the outpatient prescription and treatment details (CD).

2.
Hypertension is defined as any primary diagnosis with the first three codes of ICD_10_CM (I10-I15) in the outpatient prescription and treatment details (CD).
3. The Basic Statistical Area: It is the minimal spatial unit which is the government re-dispart the administration area for National Development Plan in Taiwan. The development of new units was considered many social and geographic factors and supplied more information and application than the administration unit (National Internal Affairs' Open-Data platform, 2021). higher-level dissemination areas (above 2st "Dissemination Area"). In the present study, we used the "community" as our spatial unit and to count how many the basic statistical areas in Custom dissemination area  (Table 3) 2. Index development group The official public data, including population number, diabetes, and hypertension prevalence in Yuli of Hualien from the National Development Council, Central Health Insurance Bureau from 2017 to 2019 as our estimated materials of the Index development group [22] . Table 2  In this study, data collation and statistical analysis were conducted using QGIS3.6 version, SAS statistical software version 9.4 version and SPSS version 21. ɑ =0.05 was considered statistically significant at all tests. The Arc-GIS was used to draw the figure.

Spatial Analysis
We used spatial analysis to calculate and estimate the medical distances among two groups: 1. For the validation group, the medical distance was been calculated according to the outpatients' addresses in the medical records. Fig 2 shows our graphical representation.
The " ( ) " means patients' location of their home and the " " is represented someone's location. The distance from their home to the medical center is , so " " means the distance between user live in ij location and Hospital. The total distance of all patient would be " ∑ " =1 . We selected the samples lived in the area of the basic statistical areas of Yuli.

2.
For the estimated medical distance of index development group, firstly, we used the official public information about population base on the basic statistical areas in National Internal Affairs' Open-Data platform to estimate the cases with diabetes and hypertension in Yuli. Secondly, in the estimation of medical distance, we calculated the actual distance from the center of each minimum statistical area to the medical center according to the road network. Finally, use QGIS to establish the "Origin-Destination Matrix" as a method to analyze the distance cost of medical care from each basic statistical area to the hospital. : The distance between population center live in kk location and Hospital One of the aims of this study is to prove it, following (Fig 3): : The distance between user live in ij location and Hospital : The distance between population center live in kk location and Hospital

Statistical Method
The percentage, mean and standard deviation were used to descript the data information. We used the consistency analysis and presented the intraclass correlation coefficient (ICC) to compare the distance between the two groups. The multiple regression analysis was used to adjust age and gender and reconfirm the correlation between the two groups.

Characters of participants and potential population with diabetes and hypertension
The data in table 1 came from public official information base on the insurance population from Nutrition and Health Survey in Taiwan [23] . The prevalence of diabetes and hypertension in Yuli town increased more significant than Hualien county year by year from 2017 to 2019 (Table   1). The diabetes and hypertension prevalence of adults during 2015-2018 base on Nutrition and Health Survey (NAHSIT) in Taiwan were higher than Hualien and Yuli that because of the survey were for ≥18 years old adults. Table 2 shows the numbers of resident population from Demographic data of the Ministry of the Interior which more than the insurance population. We used the resident population and prevalence to estimate the potential patients with diabetes and hypertension in Yuli. The potential patients with diabetes during these three years were 2289, 2425, and 2517 persons, and the potential patients with hypertension were 4453, 4585, and 4692 persons in Yuli who were the index development group in our study. We can't find the age and gender data of the population with diabetes and hypertension or any other diseases on official public data.
The age and gender information of outpatients from the medical center in the validation group were in Table 3 (Table 4). Table 4 also shows how many basic statistical areas in every community and how many cases and the estimated cases in these communities.

The accuracy of estimated medical distance
We used ICC to evaluate the accuracy of estimated medical distance on the index development group. Table 6 shows the correlation of the medical distance between the validation and index development groups by ICC for samples with diabetes. Whatever using the basic statistical area or 15 communities population-weighted center as the center points (departure location) to the medical center, there were both high correlations between two groups in 2017, 2018, and 2019. The ICC was all above 0.98 (p<0.001). For hypertension (Table 7), the outcomes were the same as diabetes, there was a high correlation between the two groups (ICC: 0.96-0.99, p<0.001).
We also used the multiple regression analysis to adjust the age and gender of the validation group for diabetes and hypertension in Tables 8 and 9. After adjusting age and gender, the medical distance of the validation group still presents a significant correlation to the index development groups (p<0.001) and their adjusted β were all above 0.95 every year (the adjusted R 2 in all regression models all above 0.95). All regression analysis outcomes can be presented in the following formulate, for examples: In 2017 for Diabetes base on table 8 The estimated medical distance= 0.994* the medical distance of the validation group +0.015*gender+0.005* age (year) Gender: male is 1, female is 0 Gender and age aren't significance (p>0.05) The results of ICC and regression analysis want to prove that the following formulate can be established (Fig 3) ∑ : The distance between user live in ij location and Hospital : The distance between population-weighted center live in kk location and Hospital

Discussion
The most important contribution of the present study was developing another alternation to replace the estimated distance from the actual address with public population and disease prevalence data that is high precision method to estimated medical distance without the real location of the patients. We developed the formula to calculate the cost of distance which could be applied in people with chronic diseases. When we collect the frequency of the patients, we will get the medical distance cost in the future. But there will be considered some conditions in application.
This method should have some assumptions before being used.

The types of the diseases in application
In the present study, we used DM and hypertension as targets to estimate the medical distance because the chronic diseases are stable and the portion of being cured is lower than that of acute diseases, trauma, accidents, and other non-chronic diseases that it is must be counted when we estimate the medical cost for long term cost in the local government. On the other hand, the medication of patients with chronic diseases will be with stable frequency, and they always visit the specific physician or the same hospital [24] [25] . The goal of Taiwan's long-term care policy is "aging in place" that most of the long term care services is set according to elders' living areas, so even those suffering diabetes and hypertension tend to live in their familiar household registration areas (refs) [26] It is pointed out that the spatial interaction between medical demanders and hospitals is influenced by the residence of medical demanders and social demography (such as gender, race, socioeconomic), so we also adjusted the age and gender to discuss the relationship of medical distance between two groups with regression in table 8 and 9 [27][28][29] . Most of the townships in Hualien County, our target county, belong to the rural areas of Taiwan that population of elders is inhabitant or indigenous people in Hualien County. Therefore, using the medical distance of chronic patients as the target disease is more accurate in estimating medical costs then other acute diseases.
Based on the above characteristics of the target disease used to estimate the medical distance, it must have the characteristics of stability and fixed medical treatment, so in addition to chronic diseases, it can also be used for some rare diseases, such as chromosomal abnormalities and autoimmune diseases such as lupus erythematosus (SLE) et al., or those population with special needs who must use rehabilitation medicine and early treatment are should suitable for the method of this study to estimate the medical distance in the medical cost. If we want to estimate long-term medical costs with other non-chronic diseases or accidents, be sure the target groups live in the area and their medical utilization frequency.

Medical Treatment Selection/The residents' behavior in seeking healthcare
There is an important effect factor for estimating medical costs using this method in our study.
That is the residents' behavior in seeking healthcare that dependent on what kind of disease, gender, income, and the medical resources near their resident et al. [24][25][30][31] . The total medical utilization frequency would be equal to he (she) used medical services in your target hospital if we just want to estimate the medical costs for a small area or one medical institution. Their medical distance between his (her) location and the target hospital of course can be used to estimate. If more medical resources in the area or your target's diseases are some specific diseases, like cancer in the initial stage, which tend to make people seek different medical care institutions easily (medical services shopping) [32] that their medical distance estimated must collect more information about their various medical utilization. The medical distance costs estimated are not just considered accurate distance measurement, the medical utilization behavior of cases should be collected.

The travel time and distance
For calculation convenient travel distance of the premise, usually with medical service consumers and providers of the relative linear distance as the calculation basis, the algorithms not only ignore the real differences between travel distance and selection of medical resources, also did not consider road network more obstacles imperfect areas (e.g., rivers, mountain), Therefore, in terms of calculating the cost of medical treatment distance, it is more appropriate to calculate the travel time based on the actual road network compared with the straight-line distance. But in Hualien County, there are only two main traffic roads between Yuli to the medical center. One of the two main roads is a coastline for sightseeing, and the other is a mountain line for local residents to seek medical treatment and business which is also the shortest straight line distance and used to estimate the medical distance in our study. So using relative linear distance would not affect our study results and it responds to the real status in Hualien. It is important issue to estimate carefully when the county or city with convenient transportation and multiple travel paths.

The potential benefit of application in medical resources allocation policy
The medical resources allocation policy is very importance issue for a country with national health insurance and national health care services system. The distribution of medical resources must conform to distributional justice. Traditionally, the center government depend the population, or the ratio of the population and physician, nurse and hospital bed in every local government and to make decision of the priority of medical resource supports or to compare where is the resources deficiency that is more quickly and convenience method. Sometimes there are special medical support program for people with specific disease or live in remote area or outlying islands (refs).
But the special medical support programs are for sub-groups or the low development areas that is the supplementary medical programs not for justice. Some studies point out the allocation policy should refer to health needs of population and the health needs (health demands) are usually base on their medical utilization including outpatient services, inpatient services and medication utilization and the diseases incidence and prevalence [15,33] . The medical needs should be considered the medical distance or spatial distribution in the local government, at present, access to medical resources is most often measured by the shortest path, the shortest travail time and the ratio of medical services to the population [33][34][35] . Once we want to get the actual shortest path or the shortest travail time, the address of patients should be knew. That is more and more difficult to catch the private or the identifiable individual information from national health insurance database or other medical records which not allowed by the cases.
The benefits of the development of estimated medical distance method in the present study which just use official public information were not only can protect the private of cases but also save much time to measure and compare the medical cost due to the distance in the different local government. To go a step further to provide the evidence to ask more reasonable resource allocation strategy to center government.

Conclusion
In order to address the limitations of various databases and protect patient privacy, this study aims to develop a policy-available cost indicator of medical distance by using government official data to address the limitation of using existing data from the National Health Insurance database to only reach townships which is oversimplification to measure the distance. We used the national health insurance database, development in chronic disease incidence or prevalence of disease distribution operation mode of a set of convenient which was estimated medical distance by this method has been proved to be highly correlated with the actual distance. It not only contribute to public health policy planning and promotion of health care, and can be deployed in advance in medical resources allocation, improve the quality of the regional medical and preventive health care.

Acknowledgments
The authors would like to thank Dr. Shyang-Woei Lin's and Dr. Tsung-Cheng Hsieh's assistance in drawing and data analysis which made this study possible. The authors are also grateful to the members of Hualien Tzu Chi Hospital and Eastern Division of National Health Insurance Administration to help collecting the data.

Ethics approval and consent to participate:
This research was not funded by the any supporters, We also declare that we have no financial interests related to the material in the manuscript.

Conflicts of Interest:
The authors declare no conflicts of interest.       One of the aims of this study is to prove it, following: : The distance between user live in ij location and Hospital : The distance between population center live in kk location and Hospital