A China Healthcare Security Diagnosis Related Groups Script Development Based On R Language

Background: To develop a set of R scripts that could eciently and accurately identify the home page information of medical records and perform China Healthcare Security Diagnosis Related Groups (CHS-DRG) simulating grouping. Methods: Based on the CHS-DRG grouping rules, we abstracted the DRG grouping process into a standard algorithm and compiled the R script Z-DRG. The DRG simulating groupings by Z-DRG were compared with the DRG results from the regional CHS-DRG integrated service platform to evaluate the accuracy. Results: The Z-DRG includes one function module (zdrgfun. Rc), one operation module (zdrgpro. R) and one database form (zdrgcodes.RData). The function module set 7 algorithm steps and 8 custom functions. The functions were set for multiple diagnoses, multiple operations, joint diagnosis and operation. Only (17.85±0.11) milliseconds were taken for CHS-DRG simulating grouping of one case. Compared with the regional CHS-DRG results, the accuracy rate was 99.10%. The difference in the number of other diagnoses is the main reason that affected the accuracy. Conclusions: Z-DRG is easy to operate. The CHS-DRG simulating groupings were ecient and accurate. The simulation results could be effectively applied for medical institutions to carry out CHS-DRG grouping prediction and improve the implementation effect of CHS-DRG payment work.


Background
Diagnosis-related groups (DRGs), one type of case mix classi cation, is a method that groups patients into several diagnostic groups based on factors such as age, sex, disease diagnosis, comorbidities, complications, treatment methods, outcomes, and resource consumption. It is an important tool for measuring medical quality, evaluating cost-effectiveness, and con rming medical insurance payments. To deepen the reform of medical insurance payments, the Chinese National Healthcare Security Administration promoted DRG Payment national pilot work in 30 cities in 2019 and enabled actual DRG payments in 2021.
At present, research on DRG mainly focuses on building statistical models to improve DRG rules [1][2] , using DRG data for medical care quality evaluation [3][4][5] , performance appraisal [6] , disease cost measurements [7] , medical behavior research [8][9] , and hospital healthcare [10] . However, there are few studies on compiling the uni ed DRG scheme into executable programs for establishing DRG simulation grouping. Research has shown that medical institutions should clarify the principle of DRG, establish timely DRG simulation grouping, improve the quality of the home page of medical records, correct ICD coding, and standardize medical behaviors to ensure that medical institutions maintain healthy development in the DRG payment reform process [11] . Therefore, we have developed a set of CHS-DRG simulation grouping scripts based on R language, which could e ciently and accurately perform CHS-DRG simulation grouping, to promote CHS-DRG payment pilot work for medical institutions. Based on the main diagnosis of the case, considering the anatomical location, etiology and clinical manifestations, the case was classi ed as MDC. Organ transplantation, ventilator use ≥96 hours or ECMO cases were classi ed into MDCA (preliminarily grouped diseases and related operations). Cases aged less than 29 days were classi ed into MDCP (newborns and other neonates with conditions originating in the perinatal period). HIV cases were classi ed into MDCY (human immunode ciency virus infections and related operations). Two or more severe traumas were classi ed into MDCZ (multiple signi cant trauma). MDCA, MDCP, MDCY and MDCZ were pre-major diagnostic categories (pre-MDCs), which were priority grouped in order. According to sex differences, we could classify reproductive system diseases into MDCM (male) or MDCN (female).

2) Adjacent Diagnosis Related Groups (ADRG)
According to the main diagnosis and main operation of the case, compared with the main diagnosis and main operation of ADRG, we could orderly enroll the case into surgical operation ADRG, nonsurgical operation ADRG and medical diagnosis ADRG.

R scripts compilation
Our research team transformed the CHS-DRG grouping principles into executable scripts by using R language. We re ned the CHS-DRG grouping principle into multiple executable algorithm steps and then clari ed the indicator input and output of the core algorithm. The R functions were used for functioning the main algorithm. The R script Z-DRG was developed based on R 4.1.0 64-bit environment. It was trouble-free running on the Windows 7 or Windows 10 operating system with 32-bit or 64-bit, and the software platform above version 4.1 of R language.

CHS-DRG simulation grouping accuracy evaluation
This study was approved by the Medical Research Ethics Committee of Fujian Medical University A liated Nanping First Hospital (NO. NPSY2020100006). The home page of medical records in Jan. to Sep. 2021 from a tertiary hospital was extracted. CHS-DRG simulation grouping was carried out.
Comparing the simulation results with the actual DRG results from the regional CHS-DRG integrated service platform, the validity was evaluated with the accuracy rate. The grouping differences were analyzed to effectively improve Z-DRG operating effectiveness. to injury), WJ1 (Burns with any operating room procedures except skin grafting), and XJ1 (Other diagnoses of contact with health services accompanied by operating room procedures), should be considered operating room procedures or special surgical operations in these MDCs.
In CHS-DRG, according to the CCs that were MCC, CC or no-CC, 121 ADRGs were subdivided into DRG with MCC, DRG with CC, and DRG with no-CC, while 255 ADRGs directly into subdivision groups without any CCs.

Chs-drg Simulation Grouping Algorithm
We standardized the grouping process of CHS-DRG and developed a set of R scripts, named Z-DRG. The Z-DRG included a Function Module (zdrgfun. Rc) for customizing functions and algorithms, an operation module (zdrgpro. R), which was used to operate programs for compiling DRGs, and a database form (zdrgcodes. RData) to store the grouping information of CHS-DRG.
The function module set 8 custom functions (Table 1) and 7 algorithm steps ( Figure 2). The operation module mainly loaded the function module and the database form, imported the home page data of medical records, batched DRG grouping, and generated a 17-column table to display the grouping results.
The database form covered a diagnosis DRG grouping table, an operation DRG grouping table, and a  CHS-DRG codes table.  The Z-DRG and CHS-DRG simulation grouping scripts were performed according to the following algorithm steps.
Step 1. Start up the R work platform. Importing the Database Form (zdrgcodes. RData) by the "load" function, loading the function module (zdrgfun. Rc) by loadcmp of the "compiler" package and then checking the required packages, including "reshape2", "dplyr", "stringr" and "openxlsx", we could be ready for loading the CHS-DRG process.
Step 2. Data preparation. The home page information exported from the Medical Records Management System was read into the R platform by the "read" function of the "openxlsx" package. According to the DRG grouping requirement, the home page data, which included medical records number, gender, age, hospitalization days, main diagnosis code main operation code, discharge method, total hospitalization cost, all other diagnosis codes, and all other operation codes, were extracted by the "dsdata" function.
Matching the diagnosis DRG grouping table and the operation DRG grouping table from the database form (zdrgcodes. RData), the whole diagnosis code table and the whole operation code table were generated.
Step 3. Pre grouping. First, setting MDC to MDCA, the case was identi ed to determine whether it was grouped into one ADRG of MDCA through the "uni.s" and "adrg.sj" functions. Second, setting MDC to MDCP, the case was identi ed to determine whether it was grouped into one ADRG of MDCP by the "adrg.n" function. If the case was grouped into both PU1 (normal newborn) and PV1 (infant diseases diagnosed from neonates (29 days ≤ birth age <1 year old)), we could distinguish by whether the case's age was less than 29 days. Third, setting MDC to MDCY, the case was identi ed to determine whether it was grouped into one ADRG of MDCY by the "adrg.dj" function. When the patient underwent surgery, we should prioritize determining whether it could group to YC1. Fourth, through the "uni.d" function, the main diagnosis and other diagnoses were combined to determine whether the MDCZ could be grouped. When MDC was MDCZ, the case was classi ed into the speci c ADRG of the MDCZ by the "adrg.n" function. If no surgical ADRG of the MDCZ could be grouped, the case should be grouped into ZZ1 (multiple signi cant traumas with no operating room procedures).
Step 4. Normal grouping. According to the main diagnosis to set MDC. The "uni.s" function was used to determine whether the case could group into ADRGs that were combined surgical operations. The scripts were in normal grouping through the "adrg.n" function.
Step 5. Special grouping. If the case was grouped into MDCM, MDCN, MDCS, MDCT, MDCV, MDCW, or MDCX, the scripts would start this step. Sex was used to distinguish whether the MDC was MDCM or MDCN. The operating room procedures were used to determine whether the patients were grouped into SB1, TB1, VC1, WJ1 or XJ1.
Step 6. ADRG discrimination. According to the ADRG results, we could determine the nal ADRG by the order of pre grouping, normal grouping, and special grouping.
Step Finally, we integrated step 2 to step 7 into the "drgs.pro" function. Then, the operation module was carried out to obtain the CHS-DRG simulation grouping results. The result data frame was a 17-column table  (Table 2).

Accuracy Analysis Of Chs-drg Simulation Grouping Results
Using the Z-DRG, we performed CHS-DRG simulation grouping of 42523 cases in a tertiary hospital from Jan. to Sep. 2021. (17.85±0.11) milliseconds on average was taken for one case. There were 35236 cases of the CHS-DRG grouping results of the healthcare security patients, which were downloaded from the regional CHS-DRG operation management system. The comparative analysis of the DRG grouping results showed that the CHS-DRG simulation grouping results were consistent with those of the regional CHS-DRG operation management system on MDCs and ADRGs. There were 34918 cases with consistent results in the DRG subdivision group, and the overall DRG grouping accuracy rate was 99.10%.
According to the CHS-DRG grouping principle, the root reason for ADRG grouping consistently with DRG grouping inconsistently lies in the difference in CCs (table 3). The difference analysis of these 318 cases with inconsistent DRG subdivision groups showed that 85.22% of cases underestimated CCs by Z-DRG, which might be because the CC (MCC and CC) tables from the regional CHS-DRG integrated service platform deviated from the CHS-DRG 1.0 revision. In addition, 14.78% were overestimated, which might be because the data used for Z-DRG contained many more other diagnoses.

Discussion
At present, we found few reports about using R language to establish a CHS-DRG grouping simulation program. Most software vendors chose to develop a CHS-DRG simulation grouping program for medical institution managers at high cost. Additionally, some vendors created a mobile version of a single case grouping tool to help coders of medical records by taking the main disease diagnosis code and the main operation code to achieve the basic grouping. During the DRG payment period, the CHS-DRG grouping strategy was needed to help medical institution managers better adapt to the detection regional CHS-DRG grouping results to achieve the combination of DRG and hospital ne management. According to the purpose, we established a set of practical CHS-DRG simulation grouping scripts (Z-DRG) based on R language.
The Z-DRG had ve advantages. First, the Z-DRG was easy to run. The medical record coders only need to import the home page data of medical records into the Z-DRG. The Z-DRG could automatically complete the code conversion and CHS-DRG simulation grouping. The Z-DRG did not require entry of the main diagnosis and main operation information manually one by one and avoided errors of manual CHS-DRG grouping. Second, the results of Z-DRG were timely. Medical record coders could perform the CHS-DRG simulation grouping by Z-DRG at any time without waiting for lagging results from the regional CHS-DRG operation service platform. Timely grouping results could improve the ICD coding level of medical record coders [11] and provide timely data references for medical institutions to carry out DRG management. Third, the Z-DRG was e cient and accurate. The grouping test showed that the CHS-DRG simulation grouping of a single case only took 16.6 milliseconds by Z-DRG. Performing CHS-DRG simulation grouping on a large number of home pages of medical records could obtain the corresponding results quickly. Matching to the results of the regional CHS-DRG operation service platform, the accuracy rate was 99.10%, which greatly enhanced the con dence of medical record coders using Z-DRG for CHS-DRG simulation grouping. According to the results of Z-DRG, clinicians should pay more attention to the correct lling of diagnosis and operation information of the home page of medical records to ensure the accuracy of DRG enrollment [8] . Fourth, the grouping results of Z-DRG were standardized. The Z-DRG extracted the xed key indicators based on the home page data of medical records and then performed CHS-DRG simulation grouping to generate the results, including the MDC code, ADRG code, DRG code and their names. Z-DRG was used for CHS-DRG simulation grouping. In different periods, different hospitals, different departments, or different clinicians, standard and comparable results were generated by using Z-DRG. Fifth, the Z-DRG was cost-effective. The Z-DRG was self-developed based on R language. It did not require a large cost for medical institutions on funds.
Of course, the Z-DRG also had shortcomings. The Z-DRG relied on the R language development environment. Those who did not understand the operating rules of R language required more learning costs in the initial stage. The Z-DRG was applicable to the CHS-DRG 1.0 revision. With the continuous iteration of the CHS-DRG version, the scripts should be updated in time to maintain grouping accuracy. Therefore, our team would continue to accumulate the results of regional CHS-DRG grouping and improve the high-precision simulation grouping of CHS-DRG by introducing machine learning.

Conclusion
In summary, we developed a set of CHS-DRG simulation grouping programs based on R language, namely, Z-DRG, which was operated easily, timely, and cost-effective. The CHS-DRG simulation grouping results by Z-DRG were accurate, effective, and standard. It was suitable and valuable for different medical institutions, departments, or clinicians. Subsequent research would integrate machine learning and visualization functions based on the current scripts to make the Z-DRG more convenient, faster and more intelligent.
List Of Abbreviations and tested the Z-DRG. QH and WP collected the home page of medical records and tested the Z-DRG. All authors read and approved the nal manuscript. CHS-DRG grouping principal framework diagram