To our knowledge, this study was the first to develop and apply regexps in automatic coding, with the specific purpose of improving coding quality and efficiency. We constructed the description models of the regexps and inserted them into the coding system via the Oracle software. The automatic ICD-10 coding system completed more than 160,000 codes in 16 months, which reduced the workload of coders and showed high precision and efficiency.
Figure 3 indicates that the code categories are concentrated in the top 100 and that perfecting the corresponding description models of the regexps can reduce the number of FNs to improve the R values. Figure 8 shows that the difference of the frequency and variation range between the codes is not as large as Fig. 3, which is the main reason we only study the top 1000. Figure 4 and Fig. 5 show that diseases of the digestive system and circulatory system in our hospital are the most diverse and largest in number, which indicates that these two kinds of diseases need more attention in the process of building the description models of the regexps. In addition, class Z diseases are the second largest because our hospital has a large neoplasm treatment centre, involving many special screening examinations (Z12), follow-up examinations after treatment (Z08) and radiotherapy and chemotherapy sessions (Z51) for neoplasms. The curve A in Fig. 6 represents the diagnosis codes correctly assigned by the automatic coding system in every month of the two testing stages. Despite the downward trend in the first testing stage, in every month, the distances between the curve A and B remained stable; that is, the number of TPs were stable. The quantity of automatic coding decreased because of changes in hospital management, resulting in the system not running for some days. Figure 7 shows the time needed for automatic coding takes nearly 100 times less than manual coding, which clearly presents automatic coding can save much time.
The values of P for the first and second test stages were up to 89.27% and 88.38%, respectively. However, two main factors result in low R, F, and A values. First, automatic coding can only be executed when the programmer starts the program. Currently, it can only be run twice a day: starting working in the morning (8:00 am) and in the afternoon (14:30). Because clinicians usually complete the homepage of the discharge medical records at the end of their work, the number of diagnosis descriptions waiting to be coded peaks at these two times. Starting the program at these times can realize the value of automatic coding very well. At the same time, the coders are also manually coding. When the program stops, these diagnosis descriptions that should be automatically coded are actually completed by the coders. This leads to too many FNs. The more FNs, the smaller the R value is, and the smaller the R value is, the smaller the F value. Second, of more than 8000 code categories in our hospital, we only matched 950 code categories with high frequency, that is about 7000 code categories with frequencies below 300 have been lost. Table 3 shows the unmodeled code categories produced about 300,000 missing codes in 16 months from 10/1/2017 to 1/31/2019, which made the number of TNs large. The high negative values correspond to the low positive values; that is, the accurately assigned codes are few, and the A values are relatively low. Nevertheless, the values of R, F and A increased in the second testing stage, which illustrated that expanding the total number of matching codes was effective. Table 4 shows that the corresponding description models of the regexps failed to establish 50 code categories (on the top 1000), which were mainly concentrated in factors influencing health status and contact with health services (class Z) and neoplasms (classes C and D). In addition, other code categories are unspecified. The main reason for this result is that diagnosis descriptions recorded by clinicians are not standardized and vary greatly for these diseases, so the correct diagnosis cannot be coded until the coders consult the complete electronic medical record. The results suggest that clinicians need to strengthen their standardization of diagnosis descriptions when recording diagnoses, especially for classes C, D and Z diseases, while programmers and coders should spend more time on these diseases when building models. On the whole, our system has high precision. With the participation of programmers, clinicians and coders, the accuracy of the system can be improved by focusing on the high-frequency diseases and code categories and repeatedly improving the quality and quantity of regexps.
In recent years, although many studies have focused on automatic ICD coding, we want to highlight the following advantages presented by our study. First, compared to other theoretical studies on model validation using public databases[29–31], we use our hospital data for research to make a system that can be directly applied to practical work. Second, coders could identify their own shortcomings and strengthen communication with clinicians in the audit process to improve their accuracy. Third, our hospital receives a large number of doctors for standardized trainings and refreshers every year. Our doctors record diagnosis descriptions in a variety of ways, so our description models of the regexps have strong representativeness and applicability. Fourth, the regexps represent rules that can be easily understood by workers, which requires less involvement of experts in system implementation and can improve the applicability to small-scale medical institutions with more limited information technology. Five, we update the existing manual coding system based on the rule base of regexps to reduce workload and improve the work quality of coders. The technical requirements and computational cost are less than those of the other methods found in most studies[7],[11],[32–36]. Convolutional neural network (CNN) [18],[34–36] is one of the state of the art proposals to solve the problem of automatic ICD coding. Despite their high accuracy, there is still a long way to go before they can be used in practice. Our automatic coding system has been running steadily, which can solve the main problems faced by most medical institutions at present - large amount and repetitive coding. Our system is designed and completed in a relatively short time by our own programmers, which runs in a simple environment. Unlike the complex methods described above, they often require the assistance of engineers of information company. The description models of regexps, we have established have good representativeness and can be used for reference. Overall, our method can transfer to other institutions. Programmers can modify these regexps slightly according to actual situation and write them into existing coding system to run.
There are also shortcomings in our study. First, the automatic coding program runs twice a day: once in the morning and once in the afternoon. When the program is not running, coders are required to do manually input the codes. The next step of our study is to explore how to automatically code the diagnosis immediately after the clinician completes the records. Second, coders are required to perform the last step of auditing, so only semi-automation can be achieved. Code auditing puts forward higher requirements for the ability of coders, and coders should continue to participate in relevant professional training and learning. Standardized diagnosis descriptions are beneficial to improve the correctness of coding. The ICD-10 classification data of some error-prone codes can be sent to the relevant clinical departments, which arouses the attention of clinicians to the standardized writing of discharge diagnosis descriptions. Whether a gold standard can be established for automatic coding auditing remains to be studied. Third, it is hard to build the description models of the regexps for identical diseases with too different diagnosis terms. Our study is based on the diagnosis of common diseases (the top 1000) and fails to include uncommon diseases. Therefore, in future work, with the complete ICD-10 coding set as the goal, matching rules need to be improved constantly. In addition, the recall, F-measure and accuracy are low in our study compared to these method mentioned above[34–36]. For example, the CNN based method had reached a F-measure of 60.86% with high efficiency[34], and the reference [36] building a feature matrix, by a pretrained word embedding model used to train a CNN had a high testing accuracy (F-measure 90.86% ). Whether our system can be fully automated with high precision by combining with the state of the art is a long-term task that we need to consider.