Validation of Algorithms Using ICD-10 Clinical Modication (CM) Codes in Claims Data to Identify People With Viral Hepatitis

Background: The ICD-9 Clinical Modication (ICD ‐ 9 ‐ CM) coding system was transited into the ICD-10 ‐ CM on October 1, 2015, in the United States and on January 1, 2016, in Taiwan. Little is known on the performance of various algorithms using ICD-10-CM codes in claims data to identify people with hepatitis B virus (HBV) and hepatitis C virus (HCV). Methods: A proportional systemic sampling of 10,000 patients aged ≥ 20 years in a health care system in Southern Taiwan were enrolled as study participants. According to the reference standards we conrmed 736 and 555 participants had HBV and HCV, respectively. Results: The algorithms with higher number of outpatient (OP) visits with ICD codes had higher positive predictive value (PPV); for example, the PPV for HBV using algorithm 1 ( ≥ 1 OP codes) was 72% and 86% according to ICD-9-CM and ICD-10-CM, respectively, and that of algorithm 3 ( ≥ 3 OP codes) was 80% and 90%, respectively. Similarly, the PPV for HCV using algorithm 1 was 88% and 96%, respectively, and using algorithm 3 was 93% and 99%, respectively. However, the algorithms with higher PPV complemented with lower sensitivity. Conclusions: In conclusion, algorithms using ICD-10-CM codes had better performance than those using ICD-9-CM codes in identifying people with HBV and HCV. Considering the tradeoff between PPV and sensitivity, the optimal algorithm is ≥ 2 OP visits or ≥ 1 inpatient visits with HBV or HCV ICD codes. International Classication of Disease Clinical Modication Ninth Revision, ICD-10-CM International Classication of Disease Clinical Modication Tenth Revision, IP = inpatients, NPV = negative predictive value, OP = outpatients, PPV = positive predictive value, Sen = sensitivity, Spe = specicity


Background
An increasing number of studies using real world data to assess the effectiveness and safety of anti-viral medication among people with viral hepatitis. [1][2][3][4][5] One critical step in using real world data is the use of the International Classi cation of Diseases (ICD) codes to identify people with a given disease, either as the main health outcome or as a covariate. 6,7 The checklist item 6.2 in REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) states that "any validation studies of the codes or algorithms used to select the population should be referenced." 8 However, there were only ve studies have examined the validity of algorithms using ICD codes to identify people with hepatitis B virus (HBV) or hepatitis C virus (HCV) infection. [9][10][11][12][13] The ICD Ninth Revision Clinical Modi cation (ICD-9-CM) coding system was transited into the ICD Tenth Revision Clinical Modi cation (ICD-10-CM) on October 1, 2015, in the United States and on January 1, 2016, in Taiwan. Although some studies have examined the impact of this transition on changes in particular diseases, [14][15][16][17][18] no study has evaluated that in viral hepatitis. Therefore, this study aimed to compare the performance of algorithms using ICD-9-CM codes versus ICD-10-CM codes in identifying people with HBV and HCV.

Design and setting
This cross-sectional study was conducted at the Chi Mei health care system in Tainan, Taiwan. The Chi Mei health care system is the largest integrated health care system in southern Taiwan and includes one medical center (Yongkang), one regional hospital (Liouying), and one district hospital (Jiali). This study was approved by the Institutional Review Board of the Chi Mei Medical Center (number: 10901-015).

Study participants
A proportional systemic sampling of 10,000 patients aged ≥20 years who had at least four visits to the Chi Mei health care system in 2015 were enrolled as study participants. We randomly sampled 6000 patients from medical center, 2500 patients from regional hospital, and 1500 patients from district hospital.

Reference standard
The patients were de ned as having HBV or HCV if they ful lled the following criteria: 1) having a prescription of anti-HBV or HCV drugs reimbursed by the National Health Insurance (NHI); 2) positive results of HBsAg, HBeAg, or anti-HCV tests; or 3) clinical diagnosis of HBV or HCV in the discharge summary for inpatients or in problem list and past history for outpatients. A query system (Hyperion) was used to review all electronic medical records for each participant from 2005 through 2019.

Coding algorithms
The ICD-9-CM and ICD-10-CM codes for HBV and HCV are listed in Table 1 To assess the performance of ICD codes in identifying patients with HBV or HCV infection, sensitivity, speci city, positive predictive value (PPV), and negative predictive value (NPV) were calculated. We compared the four validation indicators between ICD-9-CM and ICD-10-CM codes with respect to demographic characteristics.

Results
Of the 10,000 patients enrolled in this study, 726 had con rmed HBV and 555 had con rmed HCV. Of the 726 and 555 con rmed diagnoses of HBV and HCV, 192 and 177, respectively, were based on drug prescriptions, 218 and 275, respectively, were based on laboratory test results, and 316 and 103, respectively, were based on textual clinical diagnosis (Fig. 1).
The demographic characteristics of all patients, with or without HBV/HCV, are summarized in Table 2.
Three-fths of patients with HBV and half of those with HCV were men. The HCV group had a higher proportion of older adults (aged ≥70 years) than the HBV group (51% versus 29%).   Table 3 summarizes the performance of the nine algorithms in identifying people with HBV and HCV. The algorithms with higher number of visits with ICD codes had higher PPV; for example, the PPV for HBV using algorithm 1 (≥1 OP codes) was 72% and 86% according to ICD-9-CM and ICD-10-CM, respectively, and that of algorithm 3 (≥3 OP codes) was 80% and 90%, respectively. Similarly, the PPV for HCV using algorithm 1 was 88% and 96%, respectively, and using algorithm 3 was 93% and 99%, respectively. For each algorithm, the performance for identifying people with HCV was better than for identifying people with HBV. However, as the PPV increased, the sensitivity declined. For example, the sensitivity of algorithm 1 for HBV according to ICD-9-CM and ICD-10-CM was 62% and 54%, respectively, and of algorithm 3 was 53% and 45%, respectively (Table 3). Considering the tradeoff between PPV and sensitivity, algorithm 7 (≥2 OP or ≥1 IP codes) was deemed as the optimum for both HBV and HCV according to both ICD-9-CM and ICD-10-CM.
The validation indicators using algorithm 7 for different demographic characteristics are illustrated in Table 4. No signi cant differences between men and women in the indicators were noted. However, a different pattern of changes by age was found. For HBV, PPV decreased with age with both ICD-9-CM and ICD-10-CM codes. By contrast, the PPV increased with age for HCV with both ICD-9-CM and ICD-10-CM codes.

Discussion
The ndings of this study suggest that the use of ICD-10-CM codes led to better performance than the use of ICD-9-CM codes for identifying people with HBV and HCV. The algorithms performed better in identifying people with HCV than in identifying people with HBV. Of the nine algorithms examined, a gain in PPV complemented with a loss in sensitivity; considering the tradeoff between PPV and sensitivity, the optimal algorithm was ≥2 OP or ≥1 IP codes.
The Centers for Disease Control and Prevention Chronic Hepatitis Cohort Study in the United States used an algorithm of two ICD-9 codes separated by ≥6 months, which had a PPV of 90% for HBV and 92% for HCV. 11,12 The PPV of algorithm 2 (≥2 OP codes) in this study was 77% and 88% according to ICD-9-CM and ICD-10-CM, respectively, for HBV and was 91% and 98%, respectively, for HCV.
The better performance of algorithms according to ICD-10-CM than those according to ICD-9-CM is unlikely due to the differences in codes themselves because the classi cation scheme and number of codes do not differ much between the two revisions. A more likely explanation is the education and training offered by the NHI before the implementation of ICD-10-CM in January 2016, as well as some quality improvement programs later.
Our algorithms exhibited better performance in identifying people with HCV than people with HBV, especially when using ICD-10-CM codes; this was likely because the NHI has covered DAAs for people with HCV since January 24, 2017. 19 The physicians were required to provide ICD-10-CM codes for people with HCV for prescribing DAAs.
The performance of our algorithms was better than those of a previous Taiwanese study because of two possible reasons. 13 First, we used more data sources (drug prescription, laboratory results, clinical diagnosis) and included a longer study period (2005-2019) than in the previous study (laboratory results for one quarter in 2018). Some of the people with HBV or HCV ICD codes judged as false positive in previous studies might have been judged as true positive in this study because of more evidence. Second, this study was con ned to one health care system with three hospitals with relatively high quality of coding, and the previous study covered thousands of hospitals and clinics in Taiwan.
One of the strengths of this study is large sample size. Unlike some previous studies using ICD codes to recruit patients, which allowed only PPV estimation, 9,10 in this study, by using systematic sampling, we could also calculate sensitivity. Second, we used a search engine to determine clinical diagnoses through large amounts of electronic medical records. Third, this study is the rst to examine the performance of using ICD-10-CM code algorithms to identify people with HBV and HCV. Fourth, we used nine algorithms compared with only one in the previous Taiwan study.
Nevertheless, our study also had several limitations. First, this study was con ned to a health care system in southern Taiwan, which might affect the generalization to other populations. However, the main ndings (better performance of ICD-10-CM than ICD-9-CM and HCV than HBV) were affected by contextual factors (education and training program and reimbursement of DAA). Therefore, we believe that these conclusions may be applicable to other clinical settings in Taiwan. Second, some of the patients might have positive results of laboratory tests in other hospitals but were not tested in this health care system, rendering them false negatives in this study. Third, similarly, some of the clinical diagnoses recorded by physicians might not be valid.
In conclusion, using the electronic medical records of proportional sampling of 10,000 patients in a health care system in south Taiwan, this study suggests that algorithms using ICD-10-CM codes had better performance than those using ICD-9-CM codes in identifying people with HBV and HCV. Considering the tradeoff between PPV and sensitivity, the optimal algorithm was determined to be as follows: ≥2 outpatient or ≥1 inpatient visits with HBV or HCV ICD codes. Furthermore, ICD codes can better identify people with HCV than people with HBV.
Not applicable.

Availability of data and materials
The data are available from corresponding author upon reasonable request.

Competing interests
None for all authors.

Funding
This study was nancially supported by Chi Mei Medical Center number CMFHR10911.  Flowchart of de ning whether the study participants had hepatitis B or C