Main findings
This study aimed to validate IBD-related surgical procedure codes and the PPV, sensitivity and specificity of those codes in the NPR. We reviewed the charts of 262 randomly selected patients in Sweden of whom 57 (22%) underwent IBD surgery registered in the NPR between 1966-2014. Of these, 4 were excluded due to insufficient data on type of surgery in the patient charts to allow any reliable validation. For the remaining 53 patients, 158 codes were registered in the NPR. Of these 158, 155 (representing 60 different surgical procedure codes) were also present in the patient charts and validated using the charts as the gold standard for the validation. Our study showed a PPV of 96.8% for concordant codes (n=153) registered in the NPR and a sensitivity for any of the validated codes (n=155) of 94.5%.
Comparison to other studies
Very few studies have validated the quality of data of surgical procedures codes in the NPR. Lagergren and Derogar found an overall PPV of 99.6% (n=1358) for oesophageal cancer surgery[17]; Falkeborn et al, assessing gynaecological surgery, found PPVs ranging from 86-100% (n=1338) depending on the type of surgery[21]; and in the most recent study, Tao et al reported an overall PPV of 97.0% (n=572) for obesity surgery codes[18]. Outside Scandinavia, Ma et al reported PPVs from 80-100% (n=113) for surgical resection procedure codes in patients with CD registered in the Calgary Health Zone discharge administrative database[5]. Our results are similar, although we validated a larger number of procedure codes compared with that and other studies.
Analyses of subgroups show PPVs of 94.1% for abdominal codes, 100% for perianal codes and 98.1% for other IBD-related surgery codes. These findings correspond well with the overall PPV. Our PPV results for abdominal resection codes could be compared with those reported by Ma et al of 87%, 81% and 100% for partial excision of small intestine, partial excision of large intestine and total excision of large intestine, respectively[5].
The coverage of the NPR has changed over time. From 1997 and onwards, the NPR includes day surgery. In 1993, it became mandatory to register surgical procedure codes (such registration was however done to a considerable extent also before 1993). Our study included codes registered between 1966 and 2014. Falkeborn et al validated codes for gynaecological surgery registered in 1965-1983[21] and Lagergren and Derogar for oesophageal surgery in 1987-2005[17]. The different periods of inclusion may limit comparisons with our study. Tao et al included patients only during 2011, making direct comparisons difficult[18].
Sensitivity studies on the NPR are scarce. A sensitivity of 91% was found for any surgical procedure code during hospital admission of 962 patients in 1986[22] and a sensitivity of over 97% for gynaecological procedure codes in 1965-1983[21] was reported in another study. Both these studies were conducted before it became mandatory to register procedure codes in the NPR, whereas our study included codes both before and after that requirement. The higher sensitivity for gynaecological codes could be related to a smaller number of different codes in that study. We also included minor surgical interventions, such as perianal procedures, which are possibly less likely to be registered in the NPR because of the less complicated nature of the procedures. Our results show a sensitivity of 94.5% for IBD-related surgical procedure codes, which is consistent with studies of similar codes in the NPR. However, it is higher than the sensitivity of 79-86% reported by Ma et al[5]. The classification system for procedure codes changed in 1997. When comparing procedure codes up until 1996 and 1997 or later, we found sensitivities of 90.6% and 98.7%. We speculate that the higher sensitivity 1997 and onwards could be related to the introduction in 1993 of mandatory registration of surgical procedures in the NPR. However, any differences over time should be interpreted with caution due to small numbers.
Strengths and limitations
Retrospective review of patient charts should be the method of choice when validating surgical procedure codes in the NPR. This methodology has several strengths, including accurately determining the concordance between the charts and the NPR and a possibility to accurately categorise the types of error. Further strengths of our study include the random and nationwide population-based sampling of patients that reduces the risk of selection bias. Moreover, our study included surgical procedure codes registered between 1966 and 2014, allowing for assessment of the PPV and sensitivity of the NPR over time. The Swedish healthcare system, offering free access to equal care regardless of income and place of residency, provides high external validity as compared with similar healthcare systems such as those of the other Scandinavian countries. The results of this study give an estimate of the validity of surgical procedure codes for IBD related surgery in the NPR previously not known. The results allow future studies to accurately investigate the efficacy of surgery and surgical complications in various subgroups of IBD related surgical procedure codes in the NPR. The overlap of codes between IBD related surgery and other abdominal or perianal surgery lends the results external validity.
This study has some limitations. Because reviewed charts were drawn from a sample of a previous validation study[16] that did not receive all requested charts, we cannot exclude that missing charts have biased our results. However, the patients included still represent a random nationwide sample and our 20% surgery rate (53/258) for the included patients can be compared to an expected lifetime risk of surgery of 20% to 50% for IBD-patients [2-8]. We do not expect the missing charts to be significantly different from the charts included. In addition, because the classification system for procedure codes changed in 1997, it cannot be excluded that the larger amount of procedure codes introduced after the change influenced the risk of misclassification. However, the notes in the charts served as the gold standard and therefore the risk of misclassification caused by the individual surgeon was minimal.
The proportion of reviewed patient charts that included specific surgical notes was 71% (n=110). Review of the remaining charts was based on other notes, which increases the risk of misclassification of these cases compared with cases confirmed using surgical notes. This limitation, however, was addressed by including only those procedures supported by other unambiguous notes that are equivalent to surgical notes. The manual review of the charts provides a robust reviewing process. Still, it could introduce misclassification by technical translation errors or by human error. The abstracted data were therefore reviewed twice by AF to minimise transfer errors. Furthermore, the surgical notes used for validation included detailed separate descriptions of the surgical procedures and techniques used, reducing the risk of misclassification. The review was done by a single reviewer. The chart reviewer was not blinded, which might have biased the assessment of the codes.
The number of procedure codes used in IBD-related surgery is larger than the number of validated codes in this study. Nevertheless, we found and validated 60 different types of surgical procedure codes that covered the most frequently used procedures in IBD surgery (Table S1). Although the validation was limited to patients with at least one IBD diagnosis in the NPR, the validity of the investigated procedure codes is likely to be generalisable to patients without IBD.
The 95% CIs for presented accuracy measures was adjusted for clustering only on hospital level using a two-step bootstrap approach. There is to our knowledge no support for clustering also on lower hierarchical levels [19, 20]. The clustering was made in strict hierarchy with the exception of one patient who underwent surgery in two different hospitals.
Finally, because only the admission date is usually listed in the NPR, we explored the actual date of surgery through patient charts. In studies examining outcomes after surgery we recommend that the difference between hospital admission date and the actual date of surgery (in this study median: 1 day, mean: 2.1 days) is taken into account.