Comparison of clinical usefulness of serum Ca125 and CA19-9 in pancreatic adenocarcinoma diagnosis: meta-analysis and systematic review of literature

Abstract Introduction Pancreatic adenocarcinoma remains one of the most lethal cancers. The only recommended biomarker CA19-9 proves to be not accurate enough to establish a certain diagnosis. Therefore, a determination of usefulness of other biomarkers is essential. Our aim was to compare the specificity and sensitivity of Ca125 and CA19-9 by means of meta-analysis. The systematic review of combined tests (CA19-9 + Ca125) was also performed. Methods We conducted a systematic search of Medline (via PubMed) and Ovid. After screening of abstracts and the assessment of full-texts, nine studies (number of patients, n = 1599) were included. Hierarchical summary receiver under operator curve (hsROC) model was applied to estimate the diagnostic accuracy. Results CA19-9 sensitivity and specificity were 0.748 (95%CI 0.676–0.809) and 0.782 (95%CI 0.716–0.836), respectively. These values were estimated on 0.593 (95%CI 0.489–0.69) and 0.754 (95%CI 0.817–0.668) for Ca125. Regarding the heterogeneity of studies, a strong threshold effect for Ca125 and moderate one for CA19-9 were found. Conclusions Our meta-analysis did not prove the superiority of Ca125. It should be nevertheless noted that the sparsity of studies precludes accurate analysis of various factors’ influence. The review of proposed combined tests shows that CA19-9 + Ca125 models are generally characterized by higher sensitivity.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is a neoplasm with one of the highest mortality rate. Despite its relatively low incidence, according to the latest cancer statistics, it constitutes the 7th leading cause of cancer-related deaths worldwide. PDAC incidence is notably higher in the countries with high HDI (human development index) (Rawla et al. 2019). This epidemiological situation is caused essentially by lack of early specific symptoms and thus late diagnosis of PDAC. Second, the clinicians do not have at their disposal any readily available diagnostic test to ascertain or exclude PDAC diagnosis with high probability. In the current European oncological guidelines of European Society of Medical Oncology (ESMO), CA19-9 still remains as the sole recommended serum biomarker (Ducreux et al. 2015). Nevertheless, its shortcomings are well-known. (i) Around 10% of Caucasian population are so called 'Lewis-antigen nonexpressors', which leads in turn to no expression of CA19-9 (Scar a et al. 2015). (ii) Surge of CA19-9 levels is fairly often seen in plethora of other diseases (Goh et al. 2017). (iii) Early stage of PDAC is often seen without increase of CA19-9. The conducted meta-analysis concluded that CA19-9 sensitivity and specificity are around 80% (Huang and Liu 2014). Both false positive and false negative diagnosis have serious ramifications. While false negative diagnosis leads, obviously, to the delay of oncological treatment, the false positive result, in the clinical scenario of pancreatic mass's presence of benign aetiology, can be similarly detrimental. In such a scenario, patients with the misdiagnosed PDAC receive unnecessarily extensive surgery such as pancreatoduodenectomy or pancreatomy. According to published studies, the rate of pancreatoduodenectomy due to misdiagnosed PDAC amount to 5-12% of cases (Kennedy et al. 2006, van Heerde et al. 2012, Gerritsen et al. 2014, Gomes et al. 2016. Moreover, routinely used imaging studies also lack exceptionally highly specific and sensitive in differentiating between pancreatic mass due to e.g. chronic pancreatitis and PDAC (Gerritsen et al. 2015, Toft et al. 2017. Ca125, similarly as CA19-9, belongs to the high-mass glycoproteins. Its clinical usefulness was first described in diagnosis of ovarian cancer . Up-todate Ca125 and HE4 constitute as two independent factors in ROMA (risk of ovarian malignancy algorithm) test, which is used for the calculation of the probability of ovarian cancer presence (Dochez et al. 2019).
Nevertheless, there is increased evidence that Ca125 is also up-regulated in the development of PDAC. In our previous retrospective study, we also found that Ca125 with the optimal cut-off point has the diagnostic accuracy matching that of CA19-9 (Hogendorf et al. 2017).
Therefore, we aimed to conduct the meta-analysis and the systematic review of literature to assess the Ca125 performance against CA19-9.

Clinical significance
Up to date, CA19-9 is the only approved and standardized biomarker for PDAC diagnosis. Only a few welldesigned studies comparing CA19-9 with the other conventional biomarkers were carried out. In this meta-analysis, we studied the diagnostic accuracy of CA19-9 in comparison to Ca125. The analysis revealed that the specificity of Ca125 is noninferior to that of CA19-9. The sensitivity of CA125, calculated from the polled studies, is significantly lower. The results are nevertheless strongly influenced by the sparsity of studies and threshold effect. The systematic review of combined tests concluded that a combination of CA19-9 þ Ca125 in a simple model, such as logistic regression, can significantly improve diagnostic accuracy. Thus, these models can have a valuable clinical potential.

Search strategy
The performed meta-analysis and systematic review were in line with the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines (Moher et al. 2009). We performed the search of the following databases: MEDLINE (via PubMed) and EMBASE (via Ovid). The last search was performed on 25 February 2020. We used the following search construct: pancreatic cancer OR pancreatic tumour OR pancreatic adenocarcinoma OR pancreatic lesion AND diagnosis AND ca125 AND CA19-9. The number of identified potentially eligible studies is shown in the PRISMA flow diagram (Figure 1).

Eligibility criteria
We established a priori the following eligibility criteria: i. Case-control or diagnostic cohorts. ii. Studies published in English or German. iii. Study population of at least 60 participants. iv. Data included in the paper enabling to create 2 Â 2 diagnostic table. v. Histopathological examination as a gold standard.

Data extraction and study inclusion
Two authors (A.S. and A.D.) independently screened the records retrieved from the search. Selected records were further screened for eligibility in full text independently by the same investigators. Discrepancies at each stage of selection were arbitrated by a third reviewer (P.H.) and resolved by consensus.

Assessment of methodological quality
In order to assess the quality of each study, the quality assessment of diagnostic accuracy studies (QUADAS-2) were applied (Whiting et al. 2011). Two authors (A.S. and A.D.) independently filled out the assessment form for each included study. Discrepancies in the assessment were resolved by the third author (P.H.).

Data preparation
After extracting the full-text of each included study, we built 2 Â 2 diagnostic tables with false positive, negative (FP and FN) and true positive and negative (TP and TN) rates. Moreover, for each study, the information about geographical origin of the population, whether Ca125 optimal cut-off point was calculated, method of biomarkers' detection, were coded as categorical variables for the further exploration by meta-regression. The data are shown in Table 1.

Publication bias
To statistically assess the possible publication bias, the diagnostic odds ratios (DORs) for each study were calculated. Then, natural logarithm transformation was performed (lnDOR), as well as the calculation of standard error (SE). Finally, funnel plots were constructed by plotting precision (1/SE À1 ) against lnDOR. Trim and fill test was applied to evaluate the asymmetry of resulting plot.

Meta-analysis methodology
Currently, univariate analysis methods are not recommended in data synthesis of diagnostic tests' studies (Leeflang 2014, Lee et al. 2015. Although there is no clear consensus regarding the optimal analysis methods, it is widely agreed that hierarchical summary receiver under operator curve (hsROC) are the best suited for this purpose (Dinnes et al. 2016). Therefore, we decided to use the model of hsROC, first described by Reitsma et al. (2005). Further, this model was applied to assess pooled diagnostic accuracy for each biomarker.
In terms of the exploration of studies' heterogeneity and subgroup analysis, the parametric transformation described by Doebler et al. (2012) was applied for the bivariate metaregression with maximal likelihood estimation.
Univariate approach was used for the graphical summary of the sensitivity and specificity of each study as a forest plot (random effects model was applied); however, the pooled specificity and sensitivity were generally not taken into account when comparing the biomarkers' performance.
All calculations were performed using the R programming language (Core Team 2019) and two packages dedicated for calculations on meta-data, namely mada (Doebler 2019) and metafor (Balduzzi et al. 2019) were used.

Studies' quality assessment and publication bias
Simultaneously to the primary data extraction, the risk of studies' bias was evaluated as proposed by QUADS-2 tool ( Table 2). The risk of bias in the patients selection domain was assessed as high for five studies (55.56%), due to casecontrol design (four studies) and exclusion of patients without 'confident clinical diagnosis'. Regarding the applicability in this domain, we found control groups' composition in two studies (22.23%) as bias-prone, since in two cases they comprised, to some extent, of acute pancreatitis and extra-  pancreatic cases (Haglund 1986, Chan et al. 2014. Normally acute pancreatitis is clinically easily distinguishable from PDAC, thus its enrolment to the control group does not seem to be fully justified. In the other case, the bias is attributed to the inclusion of the pancreatic neuroendocrine tumours in the cancer group (Wang and Tian 2014). Lastly, the study by Gu et al. (2015) included exclusively PDAC patients undergoing chemotherapy. It was not stated clearly whether the sample taken for the diagnostic purposes was obtained before the commencement of chemotherapy in every case, thus we assessed the risk as unclear.
In the index test domain, two studies have unclear bias risk, as the authors do not state clearly whether the test interpretation was 'blinded' to the results of reference test. Additionally in one case, the authors used two distinct assays to measure Ca125 levels (Duraker et al. 2007, Wang and Tian 2014, Gu et al. 2015. In one case, we assess the applicability of index test as low, since the authors used two different cut-off points for Ca125 (Sakamoto et al. 1987).
In the remaining domains and applicability concerns, we evaluated the bias risk as low.
The funnel plots are shown in Supplementary Figures 1  and 2. The performed trim and fill method excluded the plot asymmetry for both biomarkers (p ¼ 0.14 for CA19-9 and p ¼ 0.11 for Ca125).

Meta-analysis
We identified 230 potential articles through various literature databases. After removing duplicates and irrelevant ones, 22 studies remained. These were screened by abstract and/or full-text for the eligibility. After reviewing them based on our criteria, finally nine studies were included to the meta-analysis (Haglund 1986, Sakamoto et al. 1987, Cwik et al. 2006, Duraker et al. 2007, Chan et al. 2014, Wang and Tian 2014, Gu et al. 2015, Hogendorf et al. 2017. The detailed flow diagram is depicted in Figure 1. The conducted meta-analysis included four European studies, one from the United States and four Asian studies. Five studies were designed as cohort studies, while four of them had case-control design (Table 1). They included overall 1599 patients, of whom 975 had PDAC (61%), while the control group consisted of 624 patients (39%). Two hundred and sixty-one of them had chronic pancreatitis, 102 other benign pancreatic diseases/other benign diseases (Chan et al. 2014, Gu et al. 2015, 77 acute pancreatitis, 50 cholelithiasis, 41 pancreatic cyst, 23 cholangiocarcinoma, 19 pancreatic pseudocyst, 10 pancreatic cystic neoplasm and one patient was diagnosed with pancreatic arteriovenous malformation. Furthermore, one study enrolled 40 healthy patients (Gu et al. 2015).
The summary forest plots are shown in Figure 2. As depicted, the studies vary significantly regarding reported sensitivity and specificity for both CA19-9 and Ca125.
Additionally, we calculated DOR, positive and negative likelihood ratios for all included studies. The results are presented in Table 3.
We then calculated hierarchical summary ROC for both biomarkers. The curves are shown in Figure 3.
The point estimate for CA19-9 has the following parameters: Sensitivity: 0.748 (95%CI: 0.676-0.809). Specificity: 0.782 (95%CI: 0.716-0.836). Area under the curve (AUC) was estimated for 0.832. Using the calculated hsROC, we applied it to further calculate the mean DOR, PLR and NLR. As shown in the curve comparison (Figure 3), the points of estimate are well separated, with only a few studies overlapping, suggesting that CA19-9 has indeed significantly better performance over Ca125. Nevertheless, we aimed to elucidate the heterogeneity influence on the pooled diagnostic accuracy.
As suggested by others authors, Spearman's correlation between sensitivity and false positive rate (fpr) was calculated. The Spearman rho was 0.545 and 0.764 for CA19-9 and Ca125, respectively, indicating a possible significant threshold effect for Ca125 (rho ! 0.7).

Heterogeneity analysis
To further explore the studies' heterogeneity, we performed a meta-regression. We chose a priori the following factors as a possible source of heterogeneity: i. Calculated cut-off point for Ca125 vs. standard cutoff point. ii. Study location (Asia vs. Europe/USA). 95% confidence intervals are given in the brackets. The bold font indicates the higher value of sensitivity/specificity in the comparison of Ca125 and CA19-9 in the certain study. iii. Study type (cohort vs. case-control studies). iv. Publication year (before vs. after 2010). v. Method of biomarker assessment. vi. PDAC prevalence in the study population.
As shown in Table 1, all the studies published before 2010 used a type of radioimmunoassay for the biomarkers' assessment, thus the split into subgroups for point IV and V is identical.
We did not find any statistically significant impact of study location, type, publication year (e.g. method of biomarker assessment) on sensitivity or specificity of Ca125 (Supplementary Table 1). However, the built meta-regression model showed that studies with calculated cut-off point and higher PDAC prevalence (>60%) tend to report higher sensitivity for Ca125 (p ¼ 0.021 and 0.04, respectively). To further assess the significance of these differences, the likelihood-ratio test was performed, that concluded the differences between bivariate models (general parametric model vs. parametric model with a covariate) as insignificant (p ¼ 0.153 and p ¼ 0.2, respectively). Similarly, in the univariate subgroup analysis, the calculated differences were insignificant. The pooled sensitivity and specificity for the studies estimating cut-off point value for Ca125 (n ¼ 3) were 0.696 (0.573-0.796) and 0.676 (0.53-0.794), respectively. For the studies without optimal cutoff point estimation, these values were 0.539 (0.432-0.642) and 0.784 (0.721-0.836) (p ¼ 0.055 and 0.056, respectively).
Interestingly, the meta-regression for CA19-9 revealed that studies with a calculated cut-off point for Ca125 reported lower sensitivity for CA19-9 (p < 0.0001), while studies conducted in Europe/USA had significantly lower sensitivity and significantly higher specificity than the Asian ones (p ¼ 0.032 and p ¼ 0.038, respectively). Finally, the older studies (before 2010) were characterized by higher sensitivity (p ¼ 0.016) (Supplementary Table 2). However, the conducted likelihoodratio test did not confirm the significance of the observed differences (p ¼ 0.16, p ¼ 0.2, and p ¼ 0.075, respectively).

Systematic review of combined diagnostic tests
The designed tests are summarized in Table 4. Apart from the study by Wang et al., all the reviewed articles proposed a combination test of Ca125 with the other measured biomarkers. Four older papers examined simple AND/OR formulae, that took into account CA19-9 and Ca125 levels. While the application of AND formula caused a significant increase in specificity of test with concomitant decrease of sensitivity, OR formula had an inverse impact on test's parameters. Though maximalization of one parameter at cost of another might seem promising, in all cases, apart from the model from study by Sakamoto et al. (using AND formula), the accompanying decrease was greater than resulting increase, so that the proposed combinations did not outperformed the diagnostic accuracy of CA19-9. On the other hand, three more recent studies used a logistic regression model. All the designed test succeeded in improving sensitivity over CA19-9. While the test constructed by Chan et al. managed to outperform CA19-9 sensitivity without any 'loss' on specificity, both combination models devised in our department does it at the cost of significantly lower specificity.
The test reported by Gu et al. stands somewhat apart from the other combinations, as the reported joint detection of CA19-9, Ca125, CEA and CA242 should lead to increase of both sensitivity and specificity. Unfortunately, the authors did not provide any information about the mathematical rationale behind their test.

Conclusions
The conducted meta-analysis did not find the superiority of Ca125 over CA19-9 in the diagnosis of PDAC. It should be however noted that due to the sparsity of studies comparing the both biomarkers, high heterogeneity and different control groups, the results should be taken with a certain amount of scepticism. The difference between test's sensitivity, specificity and that of CA19-9 is given in the brackets. Bold font marks the studies proposing a combined test that outperforms CA19-9 in terms of the overall performance. a Study by Wang et al. proposed a combined test of  Marks the studies with the combined test outperforming CA19-9 in terms of both sensitivity and specificity.
From the clinical point of view, one of the most important factors contributing to the differences in the estimation of Ca125 accuracy is the study's design. Part of the included studies, such as ours, dealt with the diagnosis of aetiology of the encountered pancreatic mass. Others, especially case control study, however, took more broad approach to the problem by comparing the biomarkers' levels between PDAC and benign disease, often of extra-pancreatic genesis. In our opinion, the core question is rather whether the aetiology of pancreatic mass can be ascertained by measurement of serum biomarkers, as benign diseases require obviously less invasive treatment methods and, as already stated, the misdiagnosis can have grave consequences for the patients.
Further, it should be noted that most of the studies included have fairly moderate study population and in two cases there is an underrepresentation of PDAC cases.
Apart from the significant threshold-effect for Ca125, in the conducted meta-regression, we did not find any other significant factors contributing to the heterogeneity. Nevertheless, it should be noted that there is some evidence showing that the disease's prevalence in the study population has an impact on sensitivity and specificity of the conducted diagnostic test (Leeflang et al. 2009(Leeflang et al. , 2013. As for the optimal cut-off calculation, the methods like Youden's index can be applied to maximize the diagnostic accuracy. Unfortunately, most of the included studies used merely the cut-off point suggested by the test's manufacturer protocol. It would be interesting to compare the diagnostics accuracy resulting from the application of optimal cut-off point with the parameters calculated for recommended cut-off point.
In the recent years, Ca125 emerged as a promising target in pancreatic cancer research. Of note, there is an ample evidence that MUC-16, from which Ca125 originates, can have a vital role in PDAC development (Das et al. 2015. Due to that, the MUC16-targetting in PDAC immunotherapy might be promising (Garg et al. 2014, Aithal et al. 2018. Regarding the analysis of heterogeneity in CA19-9, the preliminary observed difference between reported sensitivity and specificity in European/USA and Asian studies could be attributed to different prevalence of Lewis-antigen non-secretors in the populations. Our study from 2018 (Hogendorf et al. 2018), where the CA19-9 sensitivity of 52.38% was reported, may account for the observed difference between studies with the calculated cut-off point for Ca125 (n ¼ 3) and those without (n ¼ 6). These differences did not prove, however, to be significant when comparing the bivariate models.
Furthermore, the impact of low number for studies enrolled to the meta-analysis should be acknowledged. First, it definitely influences the conducted publication bias analysis, as a rule of thumb states that 10 or more studies are required to perform an accurate one (Dalton et al. 2016). Second, the sparsity of studies can lead to difficult hsROC model fit and result in the unreliable estimation of parameters (Takwoingi et al. 2017).
The point estimate in the hsROC model for CA19-9 has sensitivity of around 75% and specificity of 78%, which is quite similar to the results from the previous meta-analysis (Huang and Liu 2014). While these numbers indicate fairly good diagnostic accuracy of CA19-9, they are definitely too low to accept it as a standard for PDAC diagnosis. Thus, further research into pancreatic cancer biomarkers is crucial to the improvement of current epidemiology.
While CA19-9 and Ca125 are normally tested in the scenario of 'immediate' diagnosis of pancreatic cancer, O'Brien et al. (2015) analysed serum levels of CA19-9 and Ca125 of 458 post-menopausal women. One hundred and fifty-four of them were subsequently diagnosed with PDAC, the rest served as matched non-cancer control. The authors proved that a model of CA19-9 > 37 IU/mL OR Ca125 > 30 IU/mL has a sensitivity of 95.2% and specificity of 57.1% in 'diagnosing' PDAC 0-1 year (average time: 6 months) before the initial diagnosis was made.
It should be also noted that usefulness of both CA19-9 and Ca125 goes beyond the diagnosis of PDAC. A growing number of studies show that monitoring of CA19-9 and Ca125 can serve as a prognostic factor of survival or as an indicator of recurrence. The aforementioned study by O'Brien et al. showed that patients with CA19-9 > 40 IU/mL had median survival time from sample collection of 14.5 vs. 36.0 months for non-elevated group. Ca125 > 25 IU/mL was correlated with a median survival time of 14 months vs. 35 months. In other recent study, preoperative Ca125 ! 18.4 IU/mL was associated with poorer surgical outcomes . There is also an ample evidence that both biomarkers can serve as predicators of chemotherapy response and recurrence of PDAC (Nishio et al. 2017, Xu et al. 2017.
To conclude, the gathered evidence is rather insufficient to undoubtedly state that Ca125 is significantly inferior to CA19-9 in terms of diagnostic accuracy. The most important problem here is the studies' sparsity, as this can result in the suboptimal fit of hsROC model and lead to biased conclusions.
Nevertheless, since the hsROC curves are only minimally overlapping, the trend towards CA19-9 superiority, especially in case of higher sensitivity, should be appreciated.
In order to fully validate the usefulness of Ca125 in the diagnosis of PDAC the bigger, well-designed studies are paramount.
The review of combined tests shows that a fairly simplistic mathematical model like logistic regression applied to a CA19-9/Ca125-based biomarker panel can significantly increase the diagnostic accuracy. While the results of systematic review are insufficient to state whether a mere combination of CA19-9 and Ca125 would be enough to significantly increase the accuracy, a theoretical panel based either on one or both of them could prove to be extremely valuable due to its simplicity and cost-effectiveness.

Disclosure statement
The authors declare that they have no conflicts of interest.

Data availability statement
All data generated or analysed during this study available from the corresponding author (Aleksander Skulimowski) on personal request.