The ICD-11 Field Trial: Creating a Large Dually Coded Database

Objective New codes developed in the International Classication of Diseases, 11th Revision for Mortality and Morbidity Statistics (ICD-11) needed testing. Field-testing involves real-world application of the new codes to examine data quality. This paper describes the eld trial methods to create a dually coded database to eld test ICD-11 against ICD-10-CA (a Canadian modication of the International Classication of Diseases, Tenth Revision), and a reference standard data set of diagnoses. Results A random sample of discharge records previously coded using ICD-10-CA was selected. Nurses re-examined these entire charts for specic conditions and patient safety events. Clinical coders re-coded the same charts using ICD-11 codes. Inpatients discharged from hospitals in Calgary, Alberta, were identied and a dually coded database was created (n=2897). Inter-rater reliability and coding time improved with ICD-11 coding experience. Clinical coder comments enabled content to be improved in the ICD-11 browser, Coding Tool, and ICD-11 Reference Guide. This paper describes the eld trial, database creation methods, and contributions for ICD-11 improvement. Crucial future research will use this database to test ICD-11 before implementation in Canada. This paper describes the methods for creating a dually coded ICD-11 and ICD-10-CA database. The study was timely and provided recommendations for ICD-11 enhancement prior to its public release. These methods can be replicated for other code detection and validation studies worldwide. Future studies using this dually coded database and reference standard will examine ICD-11 code features, as well as ICD-11 coded data validity for common clinical conditions.


Introduction
Coded health data are important for many applications, including health services funding, physician payment, and research (1). Quality and usability of updated International Classi cation of Diseases (ICD) codes need to be examined before use.

The World Health Organization (WHO) introduced the Beta Version of ICD for Mortality and Morbidity
Statistics, 11th Revision (ICD-11) for public discussion in 2012. New features include: 1) exible codeclustering allowing rich descriptions of complex clinical scenarios; 2) new extension code chapter for enhanced detail on disease severity, progression, and timing; and 3) digital ICD-11 browser and coding tool to facilitate searching and linking to electronic health records (EHRs) (1,2). The WHO encouraged systematic testing of ICD-11 before its release to the public in June 2018.
In 2014, the WHO designated the University of Calgary as an international academic Collaborating Centre (WHO-CC). This WHO-CC undertook a large-scale eld trial that served to inform further ICD-11 development through real-world coding.
This report outlines the methods used to test ICD-11 using a dually coded database using both ICD-10-CA (Canadian modi cation) and ICD-11 codes, and chart review as a reference. A database with gold standard labelled records is essential to validate code accuracy or when data mining for speci ed conditions. Without condition labels, validity cannot be calculated. This database will be used for future studies involving mapping codes, comparability across ICD versions, improving codes and de nitions for ICD-11, and trend analysis of diseases.

Materials And Methods
We generated and linked three data sets: 1) a retrospective clinical chart review as a reference standard; 2) original ICD-10-CA coded data; and 3) re-coded ICD-11 coded data ( Figure 1). To date, data collection is complete, and analyses are underway.

Sample Size and Cohort
Based on previous (3) ndings on sensitivity and prevalence of conditions in a sample of ICD-10-CA data, 3000 records were deemed required to test a 10% difference in sensitivity of common conditions such as myocardial infarction (12.8%), cardiac arrhythmia (21.8%), hypertension (30.2%), and others. The Lachenbruch's (4) midpoint method was used.
The study cohort included a random sample of discharges selected from records between January 1 st and June 30 th of 2015, from three hospitals in Calgary, Alberta. Patients were 18 and 104 years of age with a valid Personal Health Number (PHN) for Alberta. Obstetric admissions were excluded due to short stay and absence of chronic conditions of interest. The rst 1100 records from each hospital with the lowest random chart numbers were selected. If there were multiple discharges for a single patient during the study period, we randomly selected one discharge record per patient. The additional 100 records per site allowed for missing or excluded charts.

Chart Review Dataset
Internal validation of a dually coded database involves measuring how well codes, selected from both ICD-10-CA and ICD-11, represent the diagnoses identi ed by chart reviewers, in terms of sensitivity, speci city, positive and negative predictive values.

Data Dictionary
We replicated and expanded the chart review approach from our prior study on the validity of ICD-10-CA (3). We selected 51 medical conditions, including the Charlson and Elixhauser (5,6) comorbidity conditions, and up to three harms (Table 1). We chose these conditions from other validation studies (3,7) and they are commonly used for risk adjustment. Some de nitions were based on literature (5,6) and our prior validation study (3). Where no published de nition was available, ICD-11 Browser de nitions (beta version) were used (2). Chart review conditions are listed in Table 1.

Chart Access
Patient charts were available in paper and electronic (hybrid) form in each hospital's health records department. Electronic content was accessed in Sunrise Clinical Manager TM (SCM).

Chart Review Team
Six nurse chart reviewers underwent extensive training on the data extraction process by the research coordinator. Training involved learning the data dictionary de nitions and following a consistent order to review the chart documents. To test the data de nitions, training included practice identifying the medical conditions in the same ve charts. Discrepancies between the reviewers were discussed and the data dictionary was re ned. We then proceeded with inter-rater reliability (IRR) explained below.
The nurse reviewers examined the entire chart for the presence of speci c health conditions. These reviewers were blinded to the ICD codes assigned by the coders.

ICD-10-CA Coded Dataset
We used previously coded charts because the existing ICD-10-CA dataset represented a "real-life" sample of coding practices. Alberta hospitals employ trained clinical coders (CCs) (i.e., nationally certi ed health information management specialists) who read through patient hospital charts. These CCs assigned ICD-10-CA diagnosis codes to describe each patient's hospitalization, based on ICD-10-CA Canadian coding standards (8). Each discharge record contains a unique identi cation number for each admission and up to 25 elds for diagnosis codes, which became the study dataset.

Re-coded ICD-11 Dataset
The third phase involved re-coding the same inpatient charts using ICD-11.

Training Material Development
The research coordinator and a member of Canadian Institute developed ICD-11 training materials for Health Information (CIHI). Materials included three slide sets covering ICD-11 concepts and tools (9).

Clinical Coding Team
Six professional CCs were hired and trained. Trainers included a team from University of Calgary, CIHI, and WHO experts in ICD-11 concepts. Training involved 20 classroom hours and approximately 40 hours of coding practice homework prior to coding full hospital charts. Then, the coding team and trainers met monthly during the coding phase to discuss coding issues. ICD-11 coding decisions were based on what was available at the time in the draft ICD-11 Reference Guide of the WHO (11), the WHO ICD-11 Coding Tool (12), and the Canadian ICD-10-CA coding standards (8), given that ICD-11 coding rules were limited.

Analysis
Test Inter-rater Reliability (IRR) of Chart Review To test agreement between reviewers, IRR involved two nurses reviewing sets of the same 10 charts. Agreement was checked for the presence of the 17 Charlson conditions. Where agreement was poor (kappa<0.60) retraining took place and chart review resumed in batches of 10 charts, until agreement was high (kappa>0.8) (13). High agreement was reached after two people completed 49 sets of records. Reviewers then independently extracted data from the remaining charts over several months. Data were entered into a secure electronic data collection tool called REDCap (7.6.9-©2018 Vanderbilt University). IRR was not available for the previously coded ICD-10-CA dataset.
Test Inter-rater Reliability of ICD-11 Coded Charts IRR involved 60 full charts coded by two CCs, similar to the above chart review IRR process. IRR focused on consistent coding of the main condition given the bulk of possible codes generated from full hospital charts. After the rst 40 charts, a kappa of 0.50 was reached on the main condition parent code (highest level in the ICD-11 condition hierarchy). Training continued, differences were discussed, and experts were engaged for guidance. After coding the next 20 charts, a kappa of 0.88 was reached for main condition parent codes and independent coding commenced. The CCs were blinded to the original ICD-10-CA codes and the chart review data.

Results
This paper describes the methods for creating a dually coded database, to test ICD-11. Results include the nal number in the database, coding time, and recommendations made to WHO to improve the ICD-11 browser.
The chart review sample included 3045 records. The sample of charts coded with both ICD-10-CA and ICD-11 was 3011. The nal sample for the dually coded database, with complete data for all three data sets, was n=2897. Complete meant that all chart review elds were lled, and coded charts had at least one diagnosis code.
Recommendations made for the ICD-11 Reference Guide and ICD-11 This study enabled feedback to the WHO on the new codes and Reference Guide. Changes were integrated into the ICD-11 Browser prior to its release in June 2018. Changes to the ICD-11 Reference Guide for the morbidity-related chapters included improved clinical de nitions, and expanded instructions on cluster coding and post-coordination (11). Substantial content was added to clarify Chapter 23, External Causes of Morbidity and Mortality. As such, ICD-11 Reference Guide now includes a framework and guidelines for using the three-part model to code healthcare-related harms (11). ICD-11 improvements included resolving missing codes and inclusion terms, post-coordination linkages, codes choices when documentation was ambiguous, substance/medication list, coding harms with the 3-part model, and features and functions of the Coding Tool. Example ICD-11 changes are listed [see Additional le 2].

Discussion
This detailed set of methods is available for use in other countries testing and adopting ICD-11. Our coding team was the rst to code a large number of full hospital discharge records with ICD-11. Developing this dually coded database greatly contributed to re ning the new classi cation system for all stakeholders to bene t.

Learning Points: ICD-11 Coding Training
Challenges with inter-rater reliability for ICD-11 coding were multifactorial. The comprehensive ICD-11 contains 55,000 unique codes (14), thus more code choices, while ICD-10-CA contains only 12,420 codes (15). Also, at the time of data collection, codes and coding procedures for ICD-11 were still under revision, making training and learning challenging. The CCs required training for the new code structures like code clustering (16). While information sheets were referenced from WHO Education and Implementation Committee (EIC) (17), more training materials needed to be developed by our team. Even with these resources, systematic training for coding specialists on the coding of complex case scenarios was challenging. Of note, new ICD-11 training materials are now available from the WHO (18) and the EIC committee (17), and the ICD-11 Browser, Reference Guide, and Coding Tool are re ned (2).

Learning Points: Chart Review Data Collection Quality
Training clinical chart reviewers is crucial for reliable reference data. For optimal data collection, we ensured chart reviewers had the clearest possible data de nitions and a set of steps to systematically locate conditions within the chart. The team often collected data in a collaborative environment to resolve discrepancies in real-time. A signi cant contribution of this study is our detailed data de nition dictionary, given minimal de nitions were previously published for the Charlson and Elixhauser conditions (5,6). These de nitions offer potential standard de nitions for future research.
Incomplete chart documentation was a real-world problem during coding and nurse chart review, which caused unavoidable gaps in data collection. We anticipate that documentation quality will improve as our health system transitions to electronic records, potentially enhancing chart completeness.

Conclusion
This paper describes the methods for creating a dually coded ICD-11 and ICD-10-CA database. The study was timely and provided recommendations for ICD-11 enhancement prior to its public release. These methods can be replicated for other code detection and validation studies worldwide. Future studies using this dually coded database and reference standard will examine ICD-11 code features, as well as ICD-11 coded data validity for common clinical conditions.

Limitations
Initially, IRR for ICD-11 coding was low between CCs but improved with discussion and re-training. Common reasons for low IRR may include 1) more codes to choose from in ICD-11 and 2) limited formal guidelines and reference materials. Re-testing IRR and retraining at various intervals would further strengthen IRR. While ICD-10-CA codes were collected in a "real-life" setting with various CCs, ICD-11 codes were collected in a controlled research setting with a small pool of trained CCs. Previously coded ICD-10-CA data were chosen to reduce resource use in the current study and to re ect typically collected coded data. Availability of data and material Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.

Figure 1
Steps for creating a chart review reference dataset, and a dually coded dataset Coding Over Time with ICD-11

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.