This study aimed to identify the ideal combination of drain amylase cutoff and postoperative day on which to test drain amylase. Despite a large volume of published literature addressing this question, there is no clear consensus. We found that testing on postoperative day 3–5 and using an amylase cutoff of 300 U/L, is the most internally valid combination. This screening provides the best overall combination of sensitivity, specificity, and NPV. Further, those with a drain amylase level above either cutoff, 300 U/L or 5000 U/L, were significantly more likely to have a CR-POPF. Interestingly, using a cutoff of > 5000 U/L, missed approximately 6% of the CR-POPF that would have been identified using a cutoff of > 300 U/L.
Data supporting drain placement after pancreaticoduodenectomy is conflicting. Two randomized, controlled trials argue against the use of routine drainage [11, 16]. Unfortunately, these studies did have limitations. One included patients undergoing pancretectomy and thus was not limited to pancreaticoduodenectomies while the second had a high number of protocol violations, i.e. drains placed in the nondrain group. In contrast, a third randomized trial enrolling patients from multiple centers across the United States was terminated early secondary to increased mortality in the no drain group [9]. In a subsequent planned a priori subset analysis, the investigators noted the increased mortality was confined to high-risk patients [29]. Based on this data, routine drain placement remains standard care for most patients.
Given the potential downsides of prolonged or unnecessary drainage, optimal drain management is imperative after pancreaticoduodenectomy. Prolonged drainage may result in increased rates of infection and anastomotic dehiscence; however, removing a drain too early, risks an undrained pancreatic leak [9, 30]. Current guidelines from the European and American ERAS Societies recommend removal on postoperative day three if drain amylase level is < 5000 U/L on day one [17]. These recommendations are derived from prospective and retrospective trials. Bassie et al. conducted a randomized controlled study of 114 patients reporting decreased CR-POPF with early drain removal on day three versus day five [12]. Ven Fong found drain amylase levels on day one were predictive of CR- POPF [13].
Although several investigators report similar predictive values for postoperative day one amylase levels, most guidelines recommend removal of drains on postoperative day three [21]. There is no clear data to support why testing and removal are separated by two days. We found of those who developed CR-POPF, more patients had their highest amylase measurement further from surgery. In those who did not develop a CR-POPF, we see a progressive decrease in risk on later days. Nissen et al. found this same temporal trend [26]. These data suggest that measuring drain amylase earlier may lead to overlooking some patients with CR-POPF. In contrast, a meta-analysis with pooled results from 10 trials compared testing on day one and three [31]. Testing on postoperative day one had higher sensitivity, specificity and AUC compared to day 3 (sensitivity 81%, specifivity 87%, AUC 0.89 vs. sensitivity 56%, specificity 79%, AUC .67). This clearly opposes our results; however, of the 10 studies included all were single institutions (N = 65–471) with the exception of one multicenter study (N = 1,239). In contrast, Lee et al. reported drain amylase on postoperative day three as the superior predictor of CR-POPF (AUC 0.89, CI: 0.82–0.96), when compared to day one (AUC 0.78, CI: 0.69–0.87) or 5 (AUC 0.76, CI: 0.66–0.85) [22]. These findings are consistent with our sensitivity calculations which reflect increased sensitivity for levels reported on postoperative day three.
Numerous reports have examined potential drain amylase cutoff levels. Results range from 100–5000 U/L thus leaving a question of which cuttoff is ideal [5, 13, 18, 25–27, 32, 33]. These studies were performed with smaller sample sizes relative to our study and most were performed at single institutions. Ven Fong et al. included 126 patients and reported 600 U/L afforded the best accuracy (86%), sensitivity (93%) and specificity (79%) [13]. This level was further validated in a subsequent second cohort of 369 patients. Using similar study designs Sutcliffe et al., Maggino et al., and Kawai et al. recommended cutoffs of 350 U/L, 2,000 U/L and 4,000 U/L, respectively [18, 32, 33]. Based on our analysis, we conclude that a cutoff of > 300 U/L allowed for a lower rate of missed CR-POPF when compared to a cutoff of > 5000 U/L; 4.9% of those with an amylase < 300 U/L experienced a CR-POPF compared to 11.2% with amylase < 5000 U/L. Our data is advantaged by a large sample size across a variety of practice settings.
Based upon our data, we identified postoperative day 3–5 with a drain amylase level of > 300 U/L to be the combination with the most internal validity. This combination is associated with a high sensitivity (72%), high specificity (81%), and high NPV (94%). Although all other combinations had a higher specificity, ranging from 93–98%, we accepted a slightly lower specificity and placed more importance on sensitivity to ensure that no CR-POPF is missed. A retrospective study by Mannen et al., evaluated 57 patients who underwent pancreaticoduodenectomy and reported similar findings [23]. The authors reported amylase levels of > 500 U/L on day three were associated with sensitivity and specificity of 83% and 79%, respectively.
This study is not without its limitations. First, inherent in all retrospective studies is an inability to validate the variables used to create this model. For example, we are unable to go back and confirm that each patient included did in fact have an amylase measured on every postoperative day. Second, we used a range for postoperative 3–5 which in practice is not normally seen. With current recommendations, early drain removal on postoperative 3 will occur in most cases. The use of this range makes our results less generalizable in the current practice climate. Finally, when interpreting results of diagnostic accuracy that are dependent on prevalence, such as negative predictive value, not generalizable to beyond the study population. However, with the use of a large national database such as NSQIP, this is less of a concern.