Does Serum Bilirubin Add Value in the Diagnosis of Appendicitis? A Multi-Institution Analysis of 2024 Patients


 BACKGROUNDEven with modern diagnostics, appendicitis can be difficult to diagnose accurately. Negative appendicectomies (NA) and delayed diagnosis of complicated appendicitis (CA, i.e. perforation or abscess) remain common. Serum bilirubin has been proposed as an additional biomarker to assist with the diagnosis of appendicitis. In this large series, we assessed the value of bilirubin in the diagnosis of appendicitis.METHODS A retrospective review of patients with suspected appendicitis at three hospitals over a three year period was performed. All consecutive patients with appendicectomy were included. In addition, a “discharged” sub-group of consecutive patients who were admitted with suspected appendicitis but discharged without an operation was also identified.Demographic data, presence of fever, tachycardia, total white cell count (WCC), neutrophil count, total serum bilirubin, operative findings and final histology were recorded. Multivariate logistic regression was performed to determine independent predictors of appendicitis and CA. Receiver-operator analysis was performed to compare bilirubin to WCC and neutrophil count.RESULTS There were 2024 patients: 1167 had uncomplicated appendicitis, 355 had CA and 303 underwent NA. 200 non-surgical "discharged" patients were included for comparison. Compared to those without appendicitis (NA and discharged groups), increased serum bilirubin was associated with an increased likelihood of appendicitis (OR 1.030 (1.013, 1.048), p<0.0001) and increased likelihood of CA (OR 1.035, 95% CI (1.021, 1.050), p<0.001). These results remained significant when the discharged group, NA group and uncomplicated appendicitis groups were analyzed separately.The sensitivity and specificity of bilirubin was inferior to neutrophil count for the diagnosis of appendicitis (AUC 0.657 versus 0.725, p<0.0001). Bilirubin, WCC and neutrophils each were all relatively insensitive and non-specific over a variety of cut-off values and combinations did not improve their accuracy.CONCLUSION Hyperbilirubinaemia was independently associated with an increased likelihood of both uncomplicated and complicated appendicitis, however had similar sensitivity and specificity when compared to WCC or neutrophils. Bilirubin, neutrophil count and WCC alone are not discriminatory enough to be used in isolation but may be incrementally useful adjuncts in pre-operative assessment of patients with suspected appendicitis.


Introduction
Appendicitis is a common condition, with a lifetime incidence of 7-8% in developed countries (1,2). In the modern era, appendicitis can still be di cult to diagnose accurately. In Australia, despite the use of imaging in up to 70% of patients undergoing appendicectomy, 19% of all appendicectomy specimens are "negative appendicectomies" that do not demonstrate in ammation (3). National audits demonstrate similar negative appendicectomy rates in other developed countries (4). In addition, a delay to diagnosis has been correlated with the development of complicated appendicitis (perforation, abscess and phlegmon) with associated increased morbidity (5).
Routine laboratory investigations in the work up of patients, for suspected appendicitis, include white blood cell count (WCC) and neutrophil count. In addition, many other serum and urinary biomarkers have been assessed, aiming to improve the diagnostic accuracy and assessment of severity of appendicitis, including CA-125(6), procalcitonin (7), bilirubin and calprotectin(8). Hyperbilirubinaemia is observed in many in ammatory states and bilirubin is increasingly used as a cheap, readily available and ubiquitous marker for illness severity in many hospital settings (9,10) with its role in the diagnosis of appendicitis, having been examined in several series and a systematic review (11,12).
Most previous studies that have assessed the role of bilirubin in diagnosing appendicitis have been small, with limited multivariate statistical analysis to account for relevant confounding factors. Most series have focused on the use of bilirubin to differentiate between uncomplicated and complicated appendicitis, excluding negative appendicectomy and those without appendicitis. The aim of this study was to assess the independent diagnostic value of serum bilirubin to other established diagnostic criteria for both complicated and uncomplicated appendicitis.

Methods
A retrospective review was performed of patients with suspected appendicitis at three large tertiary hospitals with 24-hour acute surgical services over a three-year period from January 2016 to December 2019. All consecutive operative cases were identi ed through searching operative theatre software records, assessing for all cases of laparoscopy or appendicectomy, in addition to searching surgical team daily patient lists for all patients with appendicitis. Non-operative patients were identi ed through searching surgical team daily patient lists. Ethical approval for this study was sought and obtained from the RBWH Human Research Ethics Committee.
Data was extracted from the electronic medical records including patient demographics, pregnancy status (urine or serum β-hCG), observations, laboratory ndings at time of admission (speci cally white blood cell count, neutrophil count, total serum bilirubin levels), radiological ndings, comorbidities, operation notes and histological ndings of the removed appendix. The highest recorded heart rate at admission and any documented temperature of 37.8°C or above were noted. Due to limitations in recording and software, heart rate and fever were recorded as dichotomized categorical variables (tachycardia was de ned as heart rate above 100, fever was de ned as a temperature of 37.8°C or above). Non-operative patients with uncomplicated and complicated appendicitis were included provided there was clear radiological evidence of the diagnosis on Computed Tomography (CT). Laboratory investigations of interest were those taken at presentation of the patient to the emergency department, prior to surgery.
Patients who underwent surgery were divided into three groups: those with no histological evidence of appendicitis (negative appendicectomy), those with complicated appendicitis and those with uncomplicated appendicitis. Complicated appendicitis was de ned as perforation or abscess cavity encountered intra-operatively, or as clearly described in the histological report. For those with nonoperative management, complicated appendicitis was de ned as CT ndings consistent with complicated appendicitis (such as pneumoperitoneum or abscess). Surgical patients were excluded from analysis if histology demonstrated alternate pathology other than acute appendicitis or negative appendicectomy (e.g. parasitic worms, tumours/adenomas) or if the patients were pregnant, had Gilbert's disease or any known liver disease.
Although those with negative appendicectomy by de nition do not have appendicitis, these patients may differ from those ultimately discharged without an operation, as the pre-surgery probability of having appendicitis was su ciently high enough for the surgeon to offer surgery. Thus, surgical admission lists were searched to identify a fourth group of consecutive patients (referred to as the "discharged" group) who had been referred and admitted under the surgical team with suspected appendicitis but were subsequently discharged without an operation. Based on the number of surgical patients identi ed in this study, to detect a one-sided, mean difference in bilirubin of 3mmol/L or higher with 80 percent power, only 60 discharged patients would be required, however a larger number of 200 was chosen to ensure su cient power for multivariate analysis (13). These were patients aged 14 and older and included those who subsequently underwent radiological investigations that were found to be negative or underwent clinical observation only and did not undergo appendicectomy within a year of this admission. These patients were only included if there was clear documentation from the surgical staff that appendicitis was suspected at the time of admission.
Three major comparisons were performed: the rst comparison to determine the risk factors associated with appendicitis (both complicated and uncomplicated) compared to patients without appendicitis (combining negative appendicectomy with the discharged group). To ensure the inclusion of complicated appendicitis did not skew the results, a sub-analysis was performed excluding the complicated appendicitis patients.
The second major comparison was between those with appendicitis, to those who were discharged (i.e. excluding negative appendicectomy). Furthermore, differences in baseline characteristics were evaluated between the negative appendicectomy and discharged groups as these groups might differ.
The third major comparison was within the subgroup of those with histologically proven appendicitis only, determining the risk factors associated with complicated appendicitis versus uncomplicated appendicitis.
Statistical analysis was performed in STATA version 15.1. Univariate analysis was performed to ascertain differences in the means of the continuous covariates (as these were approximately normally distributed, a two-sample T-test for equivalent means was used) and proportions for categorical variables (using the Chi-squared test) for each of the two comparisons listed. Multivariate logistic regression analyses were performed to ascertain the independent predictors of a negative appendicectomy and of complicated appendicitis, speci cally adjusting for demographics (age and gender), fever and tachycardia.
The diagnostic accuracy of bilirubin, WCC and neutrophils were evaluated using receiver operator curve (ROC) analysis and area under the curves (AUC) were ascertained and compared statistically using the roccomp program in STATA. To aid with interpretation of this analysis, we assessed the sensitivity, speci city, positive and negative predictive values at different cutoff values. The lower limit for detection of bilirubin was 4 and for WCC and neutrophils it was 0.
To evaluate if combinations of biomarkers improved results, combinations of cut-off points for biomarkers were evaluated for sensitivity and speci city.
Neutrophil and white cell counts are given as count x 10 9 /liter. Bilirubin level refers to the total bilirubin in micromoles per liter (µmol/L).

Results
Of 2121 patients identi ed, 2024 were included for analysis and 97 were excluded. Of the patients included for analysis, 1167 had uncomplicated appendicitis, 355 had complicated appendicitis, 303 underwent a negative appendicectomy and an additional 200 patients were in the comparison discharge group. Of the 97 excluded patients, 19 patients were excluded with tumors or adenomas of the appendix. 36 pregnant patients were excluded. 42 patients with alternative diagnoses were excluded (e.g. intestinal worms).
Demographics, clinical ndings and serum biomarker levels are given for all patients as well as those with and without appendicitis (i.e. negative appendicectomy and discharged group combined) in Table 1, in addition to the results of the univariate and multivariate analysis.
To ensure the inclusion of complicated appendicitis patients did not skew the above results, the multivariate analysis was repeated comparing non-appendicitis (n = 503) to uncomplicated appendicitis alone (n=1167), excluding complicated appendicitis. Compared to the non-appendicitis group, the independent predictors of uncomplicated appendicitis were male gender (OR 0.4495, 95% CI (0.350, These results show that hyperbilirubinaemia is also an independent predictor of uncomplicated appendicitis. Receiver operator curve analysis ( Figure 1) demonstrated increased neutrophil count (AUC 0.725) to be the most accurate biomarker, for appendicitis, compared to increased WCC (AUC 0.708, p<0.0001) and increased bilirubin (AUC 0.657, p = 0.0001). Bilirubin and WCC did not have signi cantly different AUC (p=0.28). Therefore, all biomarkers are correlated with an increased likelihood of appendicitis, however increased neutrophil count was the most sensitive and speci c test to differentiate appendicitis from those without appendicitis.

Appendicitis versus Discharged Group (Excluding Negative Appendicectomy)
Compared to the group of patients discharged without an operation, the negative appendicectomy group were younger (mean age 25.8 versus 31.5, p<0.0001), febrile at presentation (15% versus 6%, p = 0.002) and tachycardic (21% versus 12%, p = 0.013), There were no signi cant differences in the neutrophils, WCC, bilirubin or proportion of patients who were female.
Univariate and multivariate analyses were performed comparing patients with appendicitis with the discharged group, excluding the negative appendicectomy patients ( Table 2). On univariate analysis, the bilirubin was higher in the appendicitis group, (mean 17.9 versus 14.4, p=0.0002) however, this was not signi cant in the multivariate analysis (Table 2). Caption: The negative appendicectomy group were more likely to be tachycardic, febrile and younger than the discharged group, so analysis was repeated with these patients excluded. WCC White Cell Count.
* Omitted from logistic regression model due to co-linearity with neutrophils.

Uncomplicated versus Complicated Appendicitis
The results of the univariate and multivariate comparison of uncomplicated versus complicated appendicitis are provided in  An odds ratio of greater than 1 indicated increased odds of complicated appendicitis.
Receiver operator analysis ( Figure 2) demonstrated that the AUC for bilirubin (0.6451) and neutrophils (0.6455) were higher compared to WCC (0.621, p<0.001), however the AUC for neutrophils was not statistically signi cantly different to the AUC for bilirubin (p=0.97). Thus, the sensitivity and speci city of bilirubin for the detection of complicated appendicitis was no better or worse than neutrophils.
Diagnostic Accuracy and Cut-Off Values Table 4 and Table 5 provides a list of sensitivities, speci cities, positive predictive values (PPVs) and negative predictive values (NPVs) for each of the three biomarkers at different cut-off levels for appendicitis versus non-appendicitis and uncomplicated versus complicated appendicitis. The sensitivity, speci city, PPVs and NPVs were relatively poor over a range of values, mirroring the results from the ROC analysis.  Combinations of biomarkers were assessed to ascertain if there was any improvement in diagnostic accuracy. As the ROC analysis demonstrated that WCC was inferior to bilirubin and neutrophils, only the combination of neutrophils and bilirubin has been performed. Table 6 provides sensitivity and speci city for combined cut-off values over a selected representative range of bilirubin and neutrophils. Compared to Table 4 and 5, combining neutrophils and bilirubin did not markedly improve the sensitivity or speci city. Caption: Selected combinations of cut-off values of bilirubin and neutrophils and corresponding sensitivity and speci city for the two comparisons. White cell count was not evaluated as it was the poorest biomarker in the initial analysis.

Discussion
To our knowledge, this is the largest comprehensive series to examine the role of bilirubin in the diagnosis of appendicitis and the rst large series to compare those with both complicated and uncomplicated appendicitis to those without appendicitis, including a group of patients admitted with suspected appendicitis who did not undergo surgery. Previous studies have concluded bilirubin is a useful marker in the setting of suspected complicated appendicitis. We have shown the utility of bilirubin in the diagnosis of appendicitis was limited due to poor sensitivity and speci city and it would be unlikely the bilirubin level will in uence decision making in the majority of cases.
A 2017 comprehensive meta-analysis reviewed the sensitivity and speci city for the most common biomarkers in appendicitis and included studies until 2016 (12). Five of these studies in the meta-analysis had more than 1000 patients (the largest had 1271 patients), with the majority being single-institution studies (14)(15)(16)(17)(18). Most results were from univariate analysis with one study (17) performing a limited multivariate analysis including clinical data, evaluating whether the bilirubin level was independently correlated with perforation risk. With respect to sensitivity and speci city, some studies demonstrated superior receiver-operator curves for bilirubin when compared to CRP and WCC; however, other studies demonstrated the opposite. Since this meta-analysis, several further studies have been published; however, the number of patients in these studies have been small (19,20).
One study assessed the role of the bilirubin level comparing patients with histologic appendicitis to those with a negative appendicectomy (14) however none of these studies included patients who were discharged without an operation after a period of observation. Any comparison that focuses on the negative appendicectomy group needs to be treated with caution as we have shown that patients with a histologic negative appendicectomy differed signi cantly from those who were discharged as they were more likely to be tachycardic, febrile and younger. This may account for the reason why the surgeon opted to perform appendicectomy in these patients rather than discharge them as they had a higher presurgery probability of having appendicitis. This highlights that the correct comparison group should include those who were also discharged without surgery. To our knowledge, this is the rst large series to examine the discharged group.
The larger series above evaluated the role of bilirubin in distinguishing perforated versus non-perforated appendicitis concluding that increased bilirubin was a marker of perforation. However, our data shows a correlation between an elevated bilirubin with the likelihood of both complicated and uncomplicated appendicitis, thus indicating a problem with speci city for this outcome.
Practical Interpretation of Bilirubin for the Clinician From a practical perspective, the surgeon evaluating a patient with suspected appendicitis will be interested in what the bilirubin level means and what are optimal "cut-off" values for the diagnosis of appendicitis and complicated appendicitis. Results of ROC analysis unfortunately demonstrate that all biomarkers were relatively insensitive and non-speci c over a variety of cut-off values and combinations.
The tables suggest that the biomarkers are only independently useful at extreme low or high values. For instance, a bilirubin level of 4 or less had a 100% negative predictive value for excluding complicated appendicitis. If the bilirubin level is over 30, this is 96.9% speci c for appendicitis and 92.6% speci c for complicated appendicitis: a high value such as this in a patient with equivocal clinical ndings and normal WCC may guide the clinician towards observation, performing CT scan or diagnostic laparoscopy, rather than discharge.
For non-extreme values, although moderately increased levels of any biomarker statistically portends an increased likelihood of both uncomplicated and complicated appendicitis, there are no optimal "cut-off" values that are su ciently sensitive or speci c to use in isolation when evaluating the patient. Thus a moderately increased or normal bilirubin level will be interpreted similarly to a moderately increased or normal white cell count or neutrophil count, in context of other history and examination ndings when evaluating a patient with right iliac fossa pain.
Given the quality of medical imaging in 2021, for suspected appendicitis, and the numerous other independent predictors of appendicitis and complicated appendicitis seen in this report, it is di cult to conceive a situation in which the bilirubin would play a pivotal role in the diagnosis of appendicitis in any patient. Although the above insights are interesting from a statistical and pathophysiological standpoint, in our view, the signi cance of the bilirubin level in suspected appendicitis should not be overemphasized.
Should liver function tests and bilirubin be performed as a matter of routine for all patients with suspected appendicitis? At our institutions, liver function tests and bilirubin are ordered as routine for all patients admitted to the emergency department with abdominal pain, thus our physicians usually have the luxury of having this information. In institutions where ordering bilirubin is not routine, the cost of ordering additional investigations should be balanced against the likelihood that the investigation will change management: the analysis of this series would suggest that ordering bilirubin would not be costeffective due to the poor diagnostic sensitivity and speci city. A satirical yet informative Australian study(21) demonstrated a marked reduction in ordering C-reactive protein (CRP) if the physician had to pay a small ne (Australian $1). At our institutions, such a debate is particularly true for CRP, which is not universally ordered for many conditions, where the cost is weighed against the likelihood of management change. Such an issue is beyond the scope of this paper but was addressed in the meta-analysis (12), which concluded that common biomarkers (neutrophils, bilirubin etc.) were cheap but relatively inaccurate. More exotic biomarkers (IL-6, procalcitonin etc.) were more accurate; however, were signi cantly more expensive, not always available and took more time to process a result.

Strengths and Limitations
This study is strengthened by a multi-institution, large sample size and the most rigorous multivariate analysis on this topic to date with the inclusion of multiple objective clinical covariates to determine the independent predictive value of bilirubin. To our knowledge, this is the only large series to evaluate the role of bilirubin in suspected appendicitis, when compared to patients without appendicitis. Patients with non-operative management were included and a comprehensive patient identi cation strategy was utilized to ensure all consecutive patients from each institution as included.
As would be expected, our data indicates that younger age and female sex are less likely to have appendicitis, a nding that has been mirrored in other studies (22,23). A major strength of this study is the use of multivariate statistics to account for the effect of demographics, fever and presence of tachycardia.
Limitations include the retrospective design and no analysis of C-reactive protein (CRP). Ordering a CRP is not a uniform practice at our institutions thus not all patients had a CRP level, and it was felt performing analysis would be misleading.

Relevance to Current Research and Future Directions
Virtually all facets of appendicitis have come under investigation in contemporary research. Recent studies have investigated methods to reduce unnecessary imaging in the evaluation of right-iliac fossa pain (24), non-operative management of uncomplicated appendicitis (25), the monetary costs of negative appendicectomy(26) and the need and timing for surgery in complicated appendicitis (27). All such contemporary issues are reliant on the rapid and reliable diagnosis of appendicitis, of which, an e cacious biomarker would immediately be of value.
In addition to being cheap, rapidly, and readily available, an ideal biomarker for the diagnosis of appendicitis would be sensitive and speci c enough to reduce the negative appendicectomy rate and increased the timeliness of diagnosis (and thus theoretically reduce the complicated appendicitis rate).
Although this series has demonstrated several additional insights into the role of bilirubin in appendicitis, bilirubin unfortunately does not appear to ful l all these criteria and it is unlikely a further large series would reveal any additional insights beyond what has already been discovered. Bilirubin will rarely be pivotal in surgical decision making for appendicitis but it does offer some incremental value.

Conclusions
A higher level of bilirubin is independently associated with an increased likelihood of both uncomplicated and complicated appendicitis. Although statistically signi cant as a marker of the presence of the disease and the degree of in ammation, it is not clinically useful because the accuracy is not high, and it remains a non-speci c test used alone or in combination with other in ammatory markers. Although high levels are associated with a more complicated form of the disease, as an investigation it is unlikely that the bilirubin level would be pivotal in the evaluation and management of most patients with acute appendicitis. -Consent for publication: Not applicable (no individual or identifying patient data are presented).

Abbreviations
-Availability of data and materials: If the manuscript is accepted, data can be made available on reasonable request.
-Competing Interests: There are no competing interests and none of the authors have any relevant nancial interests to declare.
-Funding: There was no funding for this study.
-Acknowledgements: Thank you Dr Donald Cameron (Townsville Hospital) for providing data for analysis.
with magnetic resonance imaging: 1-year outcomes of the Peri-Appendicitis Acuta randomized clinical trial. JAMA surgery. 2019;154(3):200-7. Figure 1 Receiver-Operator curves for Bilirubin, White cell count and Neutrophil count for the diagnosis of appendicitis versus non-appendicitis. Neutrophil count had a signi cantly higher area under the curve compared to white-cells and bilirubin. This was statistically signi cant (p<0.0001), indicating neutrophils has the highest sensitivity/speci city for the detection of appendicitis.