This study demonstrated the potential of AI in accurately predicting the bile duct bifurcation site during PLDRH procedures. The proposed model achieved 97% accuracy in clinical evaluation for 5-fold cross-validation, and this high performance was maintained in the test set with 93.3% accuracy, suggesting robust performance on unseen data. This high accuracy in predicting the bile duct bifurcation site has significant clinical implications for PLDRH. Accurate identification of this anatomical landmark can potentially enhance surgical precision in determining the parenchymal transection plane, reduce the risk of unnecessary manipulation during bile duct dissection, and minimize the potential for biliary complications, a common concern in donor hepatectomy8,9. Ultimately, this could lead to improved overall donor safety, which is paramount in living donor liver transplantation.
However, the quantitative evaluation showed relatively lower scores, with a mean DSC of 0.472 and IoU of 0.339, sensitivity at 0.643, and specificity at 0.993. The disparity between the high clinical accuracy and the modest quantitative metrics can be attributed to the unique challenges in identifying and labeling the bile duct bifurcation site. The common hepatic duct's entry point into the liver, while anatomically present, is often obscured by surrounding connective tissue. Even after dissection, this structure is rarely fully exposed during surgery. Our ground truth labeling method, which used circular annotations to indicate the predicted entry point, was designed to accommodate this clinical reality. However, this approach, while clinically relevant, may have led to lower quantitative scores. The circular annotation, necessary for practical identification, inherently introduces imprecision when evaluated against pixel-level segmentation metrics like DSC and IoU17.
Our human-in-the-loop approach, integrating expert surgical knowledge into the AI training process, proved highly effective in improving model performance. This method allowed for refinement of the model through expert review of pseudo-labels, elimination of inappropriately labeled images, and ensured a high-quality training dataset. The significant improvement in accuracy from the initial model (89%) to the final model (97%) underscores the value of this approach. Quantitatively, we observed improvements in key metrics, with the DSC increasing from 0.392 (SD 0.04) to 0.472 (SD 0.04), and the IOU rising from 0.279 (SD 0.03) to 0.339 (SD 0.03). This methodology not only enhanced model performance but also significantly improved the efficiency of the annotation process. While the initial pixel-level segmentation was performed on 150 frames, our approach enabled us to leverage the initial model to incorporate an additional 901 frames into the training process. This substantial increase in training data, achieved without the need for time-consuming manual annotation of each frame, demonstrates the scalability and efficiency of our human-in-the-loop approach18.
Despite the promising results, several limitations should be addressed in future studies. Firstly, the single-institution study design limits the generalizability of our findings. Secondly, the relatively small sample size, while sufficient for this feasibility study, calls for further validation with larger datasets. Additionally, the challenge of accurately representing the bile duct bifurcation site in ground truth labeling, as evidenced by the discrepancy between clinical and quantitative evaluations, requires further investigation and methodological refinement. Furthermore, the exploration of real-time application during surgery represents an exciting next step. This would involve integrating the AI model into the surgical workflow and assessing its impact on surgical decision-making and outcomes.
In conclusion, this study demonstrates the significant potential of AI in enhancing the precision and safety of PLDRH procedures. While challenges remain, particularly in bridging the gap between clinical accuracy and quantitative metrics, the high clinical accuracy of our model suggests a promising step toward integrating AI into liver transplant surgery. Future multi-center studies with larger datasets and refined evaluation methods will be crucial in validating and expanding upon these encouraging results.