Development and validation of a diagnostic model for AFP-negative hepatocellular carcinoma

AFP appears to be negative in about 30% of overall hepatocellular carcinoma (HCC). Our study aimed to develop a nomogram model to diagnose AFP-negative HCC (AFPN-HCC). The training set included 294 AFPN-HCC patients, 159 healthy objects, 63 patients with chronic hepatitis B(CHB), and 64 patients with liver cirrhosis (LC). And the validation set enrolled 137 healthy controls objects, 47 CHB patients and 45 patients with LC. LASSO, univariate, and multivariable logistic regression analysis were performed to construct the model and then transformed into a visualized nomogram. The receiver operating characteristic (ROC) curves, the calibration curve, decision curve analysis (DCA), and clinical impact curve (CIC) were further used for validation. Four variables including age, PIVKA-II, platelet (PLT) counts, and prothrombin time (PT) were selected to establish the nomogram. The area under the curve (AUC) of the ROC to distinguish AFPN-HCC patients was 0.937(95% CI 0.892–0.938) in training set and 0.942(95% CI 0.921–0.963) in validation set. We also found that the model had high diagnostic value for small-size HCC (tumor size < 5 cm) (AUC = 0.886) and HBV surface antigen-positive AFPN-HCC (AUC = 0.883). Our model was effective for discrimination of AFPN-HCC from patients with benign liver diseases and healthy controls, and might be helpful for the diagnosis for AFPN-HCC.


Introduction
Hepatocellular carcinoma (HCC) is the dominant histological type of liver cancer and accounts for 75-85% of all cases. According the global cancer statistics 2022, HCC is the sixth most common malignancy and the third leading cause of cancer-related death in the world (Siegel et al. 2022). Surgical resection is still the preferred method for the treatment of HCC but is not suitable for HCC patients with advanced stage (Gluer et al. 2012). Unfortunately, due to the occult onset of HCC and the lack of specifically early markers, most patients are often diagnosed at an advanced stage, which is associated with a high recurrence rate, metastasis rate, and a poor prognosis (Lee et al. 2022). Accurate surveillance and differential diagnosis of HCC can significantly improve patient survival.
As the prognosis of HCC depends largely on the stage at which the tumor is detected, early detection of HCC is critical to improve the survival of affected patients. Professional society guidelines from the American Association for the Study of Liver Diseases (AASLD) and European Association for the Study of the Liver (EASL) both suggest a liver ultrasonography (US) for patients with high risk to develop into HCC as the first level of surveillance (Heimbach et al. 2018; EASL-EORTC Clinical Practice Guidelines 2012). However, the effectiveness of US mainly depends on the experience of operators and it is difficult to distinguish tumors from liver cirrhosis nodules which closely correlated with HCC (Simmons et al. 2017). A meta-analysis detected that the sensitivity of ultrasound was only 47% (95% CI 33%-61%) for detection of earlystage HCC (Tzartzeva et al. 2018). Magnetic resonance imaging (MRI), computed Tomography (CT), and other cross-sectional imaging techniques have a higher accuracy, but they are relatively expensive for widespread screening (Ichikawa et al. 2014).
Biomarker assessments are more objective, easily accessible, and noninvasive tools for HCC diagnosis. Of all biomarkers, alpha-fetoprotein (AFP) is the most widely used serological indicator for HCC worldwide (Zheng et al. 2020). However, about 30% of overall HCC patients cannot be observed with elevated serum AFP and AFP can also elevate in other benign liver diseases such as chronic hepatitis B(CHB) and liver cirrhosis(LC) (Hu et al. 2022). These facts spark controversy about the use of AFP and the AASLD no longer recommend the use of AFP during HCC surveillance (EASL Clinical Practice Guidelines 2018). A delayed diagnosis of AFP-negative HCC (AFPN-HCC) frequently leads to delays in treatment and subsequently to a serious consequence.
The detection of circulating biomarkers associated with AFPN-HCC can improve diagnostic accuracy and overcome the disadvantages of current diagnostic strategies (Chen et al. 2018). Prothrombin induced by vitamin K absence II (PIVKA-II), also known as Des-γ-carboxy prothrombin (DCP), is an abnormal prothrombin molecule and has been used as a biological marker which is increased in HCC (Feng et al. 2021). Evidence has been presented that PIVKA-II can improve the positive rate of diagnosis for AFPN-HCC patients (Qi et al. 2020). However, serum PIVKA-II levels are also increased without HCC because of a shortage of vitamin K or usage of vitamin K antagonists (Kondo et al. 2020). This makes its clinical applications as a marker of HCC limited. Lens culinary agglutinin-reactive fraction of fetoprotein (AFP-L3) is the glycosylated subfraction of AFP and is a more specific indicator than total AFP for HCC (Toyoda et al. 2011). However, previous studies have indicated that AFP-L3 presented low diagnostic sensitivity in cases where AFP is not markedly elevated (Hu et al. 2013). Moreover, the strategy of combined multiple biomarkers has been shown to significantly enhance diagnostic performance (Caviglia et al. 2016;Terling et al. 2009). Johnson PJ developed a novel diagnostic model based on gender, age, and the three above-mentioned biomarkers (named GALAD score) in a British cohort in 2014 (Johnson et al. 2014). Then many studies have been conducted to comprehensively access the diagnostic performance of the GALAD score in differentiating any-stage or earlystage HCC. In an international cohort of 6834 patients (2430 with HCC and 4404 with chronic liver disease), GALAD achieved sensitivities ranging from 60 to 80% for early HCC detection (Berhane et al. 2016). But numerous studies tended to focus on application on early-stage HCC or in populations with different etiologies and omit or ignore the AFPN-HCC (Schotten et al. 2021;Singal et al. 2022). Liu. et al. (2020 developed a serum-based GAAP which was based on gender, age, AFP, and PIVKA-II for surveillance of HCC, and the model had an AUC value of 0.888 for discriminating AFPN-HCC from the entire control. Although data are promising, confirmation of the clinical effectiveness in larger studies is needed.
The purpose of this study was to develop a diagnostic model with combined multiple biomarkers or serological examinations for AFPN-HCC via LASSO regression analysis, univariate logistic regression analysis, and multivariable logistic regression analysis. Then, the diagnostic model was transformed into a visualized nomogram, and its discrimination, calibration, and the net benefits were further validated. External validation was made on validation sets. We hope that the diagnostic model could be applied for the diagnosis of AFPN-HCC.

Study population
All HCC patients confirmed by postoperative pathology from January 2019 to May 2022 were included in this study. After screening according to inclusion and exclusion criteria, 294 AFPN-HCC patients from the First Affiliated Hospital of Fujian Medical University were selected as training set and 227 AFPN-HCC patients from Fujian Provincial Hospital were enrolled as validation set for this retrospective study. Training set also included a cohort of 159 healthy objects, 63 patients with CHB, and 64 patients with LC as controls. The validation set from Fujian Provincial Hospital consisted of 137 healthy controls objects, 47 CHB patients, and 45 patients with LC. A written informed consent was obtained from all subjects. All procedures followed were in accordance with Helsinki declaration. The development of the study followed the criteria of the TRIPOD statement (Collins et al. 2015). The Ethics Committee of the First Affiliated Hospital of Fujian Medical University reviewed and approved this study (2018[048]).

Inclusion criteria and exclusion criteria
The HCC inclusion criteria were listed as follows: (1) postoperative pathological diagnosis of HCC; (2) preoperative AFP negative (< 20 ng/mL); (3) first onset without any anticancer treatment before hepatectomy. The exclusion criteria were as follows: (1) a history of other types of cancer; (2) recrudescent HCC; and (3) missing data.
The inclusion criteria of patients with CHB were based on the Asian-Pacific clinical practice guidelines on the management of hepatitis B (Sarin et al. 2016): (1) age above 18; (2) positive hepatitis B surface antigen for over 6 months. The exclusion criteria were listed as follows: (1) liver diseases with other hepatitis virus infection or other liver diseases; and (2) imaging or blood evidence of HCC.
The healthy controls were individuals who underwent physical examination with no family history of cancer and liver-related diseases and normal results of routine tests including full blood count, electrolytes, liver and kidney function tests, coagulation function, AFP, CEA, and PIVKA-II.

Laboratory examination and data collection
Demographic data (age and gender) were obtained from the electronic medical records. Laboratory examination data including complete blood routine, electrolytes, liver and kidney function, coagulation function, AFP, CEA, and PIVKA-II were obtained within 1 week before surgery from the Laboratory Information System (LIS). The serum AFP and CEA were measured by the electrochemiluminescence detection system (Roche, Basel, Switzerland); PIVKA-II was measured by the chemiluminescent microparticle immunoassay (Abbott Laboratories, IL, USA). The blood routine was performed using an ADVIA 2120 automatic blood analyzer (Siemens, Erlangen, Germany). The biochemical indexes were detected via the Cobas-8000 automatic biochemical analyzer (Roche Diagnostics, Basel, Switzerland). Automated coagulation tests were performed using CS5100 coagulometric auto analyzers (Sysmex, Kobe, Japan). The laboratory methods for each test are provided in Supplementary Table 1.

Statistical methods
All statistical analysis were performed by SPSS 26.0 (IBM Corporation, 2020, USA) and R (version 4.2.0, R Foundation for Statistical Computing, Vienna, Austria. URL https:// www.R-proje ct. org/). Post hoc assessments of sample size were performed via the "EPV (events per variable)" empirical method. The sample size of logical regression analysis used the method of EPV (events per variable), and it is generally recommended that EPV be at least 10 (Peduzzi et al. 1996). In this study, 12 factors were included in multivariate analysis, and 120 samples were needed by calculating the sample size according to EPV, so the number of patients included in training set (n = 294) and validation set (n = 227) were sufficient. Continuously distributed variables are reported as mean ± standard deviation (SD) for data with normal distribution, or median and interquartile range for nonnormally distributed data. Categorical covariates were expressed as percentage. LASSO and univariate logistic regression analysis were used to identify individual factors of AFPN-HCC. Then the significant variables with significance P < 0.05 were selected for multivariate analyses. A visualized nomogram was conducted based on the results of the multivariate model. The predictive performance of the algorithm models was measured by the receiver operating characteristic (ROC) curves analysis. DeLong's test was used to compared the validity between the AUCs of each model. A calibration curve was generated for evaluating the calibration. Decision curve analysis (DCA) and clinical impact curve (CIC) were conducted to determine the clinical benefit of the model.

Characteristics of patients
After screening for study inclusion and exclusion criteria, a total of 580 patients from the First Affiliated Hospital of Fujian Medical University were enrolled in training set including 294 AFPN-HCC patients, 159 healthy objects, 63 patients with CHB, and 64 patients with LC. And the validation set from Fujian Provincial Hospital consisted of 137 healthy controls objects, 47 CHB patients, and 45 patients with LC. Summary of the study design is presented in Fig. 1. Baseline information included demographic data and clinical laboratory data of all objects are presented in Table 1. Among all of 56 variables, there was no significant difference (P > 0.05) between training set and validation set. The results indicated that the two data sets come from the same distribution and can be used for model construction and validation.

Identify independent variables significantly associated with AFPN-HCC
To identify the impact of individual factors associated with AFPN-HCC, univariate analysis was performed in training set. After screening, 17 variables including age, gender, CEA, PIVKA-II, monocyte counts, neutrophil counts, PLT counts, WBC counts, MCHC, RDW, ALP, CK, CREA, GLU, APTT, Fg, and PT with statistical significance (P < 0.05)  were brought into the next analysis (Table 2). Summary statistics for all independent variables are presented in Supplementary Table 2. Given the large number of variables, LASSO regression was used for dimensionality reduction analysis to further screen AFPN-HCC-related factors from the univariate analysis result. Figure 2A shows the path of all candidate variable coefficients included in the model according to the level of logarithmic transformation λ, and as the optimal penalization coefficient λ increased, the number of independent coefficients tended toward zero. Identification of the λ in the LASSO model used tenfold cross-validation and minimum criterion. The confidence interval (CI) under each λ is shown in Fig. 2B. The nonzero coefficients were considered to have strong prognostic potential in the LASSO penalized regression model. As a result, a total of 17 significant factors from the result of the univariate regression were used for the LASSO regression, and 8 key variables including age, gender, PIVKA-II, monocyte counts, PLT counts, ALP, PT, and MCHC were left (Table 3). All 8 variables were compared between the HCC, LC, CHB, and HC groups ( Fig. 2C-J). The level of age, PIVKA-II, PLT counts, and PT were higher in HCC group than those in other three groups (P < 0.05). For monocyte counts, ALP, and MCHC, there was no significant difference between HCC, LC, and CHB (P > 0.05). But monocyte counts (P < 0.001) and ALP level (P < 0.0001) were higher in HCC than those in HC, while the level of MCHC (P < 0.0001) was lower in patients with HCC compared with HC.

Developing and visualized a multivariate logistic regression model
A total of 8 variables were included in the multivariate logistic regression analysis to evaluate combination effects of multiple factors. To facilitate clinical usefulness and practicality, the PIVKA-II was transformed into four levels (1, 2, 3, 4)according to quartile. At last, four significant predictors (age, PLT counts, PT, PIVKA-II) were included in the final model. The OR, 95% CI and its statistical significance for each variable are presented in Table 4. Univariate logistic regression analysis showed that age (OR = 1.082, P < 0.001), PIVKA-II (OR = 6.318, P < 0.001), PLT (OR = -0.006, P = 0.01), and PT (OR = -0.588, P = 0.003) could construct a diagnosis model of AFPN-HCC. We then established a nomogram for AFPN-HCC diagnosis including these four independent factors based on the multivariate logistic regression analysis (Fig. 2K). The use of the nomogram is simple. Firstly, the points corresponding to each variable were marked, and then the sum of the points was calculated as the total points; at last we can get the predicted probability value corresponding to the total point.

Validation prediction models
The receiver operating characteristic (ROC) curve was used to assess the diagnostic value of the model. The AUC of ROC to distinguish HCC patients from controls in training set was 0.937(95% CI 0.917-0.956) and in the validation set was 0.942(95% CI 0.9096-0.964) (Fig. 3A, E). In addition, with a cut-off point maximizing the sum of sensitivity and specificity, the model achieved sensitivity and corresponding specificity values of 0.902 and 0.854 in  (Fig. 3I). HBV infection is associated with HCC, and HBV infection was defined as positive for HBV surface antigen (HBsAg). Therefore, we also plotted the ROC in AFPN-HCC patients with positive HBsAg from training set and the AUC was 0.883 (95% CI 0.851-0.913) (Fig. 3J). To evaluate the agreement between predicted probability and the fraction of true observed outcome, the calibration curve was plotted. The calibration curve (Fig. 4A, B) demonstrated that there was a good agreement between the actual observations and predicted probabilities of AFPN-HCC, and the nomogram model appeared to be well calibrated in training set (mean absolute error = 0.016) and validation set (mean absolute error = 0.010).
To assess the clinical practicality and usefulness of our model, the DCA and CIC were conducted. DCA was conducted to determine the clinical utility of the model by quantifying the net benefits at different threshold probabilities. With the extension of the model curve, the net benefit increases, and the results showed that the model yielded net benefits both in training set (Fig. 4C) and validation set (Fig. 4D). CIC of the model in the training set (Fig. 4E) and validation set (Fig. 4F) showed that the predicted number of high-risk patients was always greater than with outcomes of HCC when the risk threshold was in the range of 0-0.3, and the cost-benefit ratios would be acceptable in the same range.

Comparison of the diagnostic efficacy of PIVKA-II and GAAP model
In order to investigate the different indicators' accuracies, ROC curves were drawn and the AUC comparison was performed using DeLong's test (Fig. 5, Table 5). As demonstrated previously, the level of PIVKA-II in the HCC group was significantly higher than that in other subsets (P < 0.001, Fig. 2E). The AUC value of PIVKA-II alone to diagnose HCC in the training set was 0.851, which was lower than our model (P < 0.001). The GAAP model provided an AUC value of 0.892 and it was lower than the AUC of our model (P = 0.012).

Discussion
In this study, we constructed a nomogram prediction model including age, PIVKA-II, PLT, and PT through LASSO, univariate, and multivariate logistic regression analysis. An additional series of analysis was performed for model validation using training set and external validation set. It is well known that the AUC represents the diagnostic efficacy: AUC values of 0.5-0.7 indicate that the diagnostic value is limited, AUC values between 0.7 and 0.9 indicate a perfect diagnostic value, and AUC values greater than 0.9 indicate high accuracy. To further verify the diagnostic power of the model, we compared the discrimination of different control population (HC, CHB, and LC). Thus, our data indicated that the model was deemed fit. Furthermore, we analyzed the performance of the model in the diagnosis of AFPN-HCC patients with small-size (tumor stage < 5 cm) and Data were presented as the odds ratio with the confidence interval OR odds ratio; CI confidence interval. CEA, carcinoembryonic antigen; PIVKA-II protein induced by Vitamin K absence or antagonist-II; PLT, platelet; WBC white blood cell; MCHC mean corpuscular hemoglobin concentration; RDW red blood cell distribution width; ALP alkaline phosphatase; CK creatine kinase; CREA creatinine; GLU, glucose; APTT activated partial thromboplastin time; Fg fibrinogen; PT prothrombin time HBV-related AFPN-HCC. The model obtained good performance in diagnosing small-size AFPN-HCC and HBsAg (+) AFPN-HCC. Overall, the model presents excellent value in diagnosing AFPN-HCC patients from healthy controls and benign liver diseases.
In this study, we included age in the final result and the trend matched the clinical and epidemiological observations that increased age are regarded as an independent HCC risk factor (Kulik and El-Serag 2019). HCC patients often have chronic hepatitis and liver cirrhosis background, so the liver function and liver reserve have been altered and damaged. Among the four key factors, PT and PLT are part of the biomarker referred to liver function. PT is a very important index reflecting liver synthesis function, reserve function, and increased PT depending on the decreased synthesis of liver-derived coagulation factors (Amitrano et al. 2002;Matchar et al. 2015). The nomogram shows that high level of PT represents a low point. One possible reason may be that the CHB hospitalized patients always occur with severe liver dysfunction which may lead to a high level of PT. Another reason may be due to that the data of HCC patients were collected before surgery with well-preserved hepatic function, so the PT tests were mostly within normal ranges. Numerous predictive models based on PLT counts have been built to identify HCC risk and the result showed that low numbers of platelets were associated with increased risk of HCC (Serag et al. 2014). Due to the liver cirrhosis background which could ultimately lead to portal hypertension and hypersplenism, HCC patients showed a subsequent lower platelet count than healthy individuals.
For a long time, there has been considerable effort devoted to searching for new biomarkers for diagnosis of HCC. In clinical practice, AFP is the most commonly used serum marker to screen for and diagnose HCC, but AFP-based methods are unsuitable for patients with AFPN-HCC. Several studies have discovered novel potential biomarkers such as PIVKA-II, AFP-L3, Golgi protein 73 (GP73), squamous cell carcinoma antigen (SCCA), centromere protein F autoantibody (anti-CENPF), glypican 3, and a number of DNA biomarkers, RNA biomarkers, and protein biomarkers, but the clinical applications await future large-scale validation studies (Ali et al. 2020;Liu et al. 2018Liu et al. , 2015Hong et al. 2015;Anatelli et al. 2008;Chalasani et al. 2021). Among these, PIVKA-II has been detected to be elevated in HCC patients and has high sensitivity and specificity for differentiating HCC from patients with cirrhosis or chronic hepatitis (Perne et al. 2023). Previous studies have indicated that the AUC of PIVKA-II for diagnosing HCC ranged from 0.701 to 0.854, sensitivity from 0.51 to 0.77 and specificity from 0.678 to 0.912 (Lim et al. 2016;Poté et al. 2015). In our study, the AUC for using PIVKA-II alone was 0.884 (95% CI 0.857-0.911) with sensitivity of 0.46 and specificity of 0.837 using training set data, which was consistent with the previous studies. However, as warfarin usage increased alongside the worldwide aging population, the value of PIVKA-II in HCC diagnostic will gradually decrease. A number of lines of evidence have indicated that PIVKA-II had no sufficient sensitivity as single markers for routine use in HCC surveillance (Park et al. 2017;Piratvisuth et al. 2022).
Currently, the combination of multiple indicators has been investigated as candidate HCC biomarker. There was a consistent view that the combination of biomarkers is Fig. 2 Selected candidate indicators and developed a visualized diagnosis model of AFPN-HCC. A Tuning parameter (λ) selection in the LASSO model used tenfold cross-validation via minimum criteria. Dotted vertical lines were drawn at the optimal values using the minimum criteria and the 1 standard error of the minimum criteria. B LASSO coefficient profiles of the 17 factors. A coefficient profile plot was generated against the log (Lambda) sequence. Vertical line represents the values selected where optimal lambda resulted in 8 nonzero coefficients. For gender (C), Chi-square tests were performed. For age (D), monocyte counts (F), PLT counts (G), and ALP (H), the multiple pairwise comparisons were made using Tukey's Method. For PIVKA-II (E), PT (I), and MCHC (J), Kruskal-Wallis tests were used for comparisons among groups. K The nomogram of the model. The red dot represents the patient's characteristics on each variable axis and they are projected to the top line to get the corresponded point. Each summation point corresponds to a predicted probability value in the horizontal line on the bottom of the nomogram. ***P < 0.001, **P < 0.01, *P < 0.05, ns P > 0.05, no significance. HCC hepatocellular carcinoma; LC liver cirrhosis; HC healthy controls; CHB, chronic hepatitis B; PIVKA-II protein induced by Vitamin K absence or antagonist-II; PLT platelet; ALP alkaline phosphatase; PT prothrombin time; MCHC mean corpuscular hemoglobin concentration ◂   We found that the model presented a good performance in positive HBsAg AFPN-HCC patients (AUC, 0.883).
Another advantage of our model is easy to compute without the use of complicated formula, after visualization as a nomogram. The nomogram provided easy-tounderstand clinical tools in which higher total points had a greater probability of being diagnosed as AFPN-HCC. For example, laboratory examination result of a 65-yearold male (79 points) found that PIVKA-II is 455mAU/ ml (assignment is 4, 100 points), PLT is 185 × 109/L (67 points), PT is 11.8 s (78 points), and then a total point value of 324 points is given, which corresponds to a 0.989 probability being diagnostic as HCC.
There remain inefficiencies in this research. Firstly, we concede that our study was only a retrospective analysis, but this analysis will serve as a basis for a prospective trial. Then, the training set and validation set were from the same area. It may have inevitably caused bias. The prediction accuracy could perhaps be improved in further studies with large sample sizes and larger multi-centric studies. In conclusion, we assessed age, gender, and 50 commonly used laboratory index and identified a combination of biomarkers that may be of use in the diagnosis of AFPN-HCC. The model partially filled the diagnostic blind area of AFPN-HCC and showed potential for further improving detection rates for AFPN-HCC.