The potential challenge in using instrument detection software lies in data privacy, technical reliability and system dependability, and acceptance by medical staff.. To address this issue, this feasibility study explored the use of deep learning-based computer vision algorithms in open surgery, specifically in orthopaedic surgery. It could be shown that the used tracking program was able to identify given surgical instruments from a larger dataset in a video sequence.
The bar graph depicting the total distance covered by each surgical instrument in pixels provides valuable insights into the movement dynamics during the surgical video analysed. The data indicates that the Metzenbaum scissors exhibited the highest degree of movement, followed by the Spring forceps, and the Cat paw retractors. The Cat paw retractor showed the least movement, suggesting it remained relatively stationary during the procedure, which is congruent to the real life application.
These findings highlight the variability in the use and movement of different surgical instruments during an orthopaedic procedure. The substantial movement of the Metzenbaum scissors and Spring forceps mirror the fact that these instruments have an active role in the surgery, such as cutting and holding tissues. In contrast, the relative stability of the Cat paw retractors implies their function as stabilizing tools, which aligns with their intended use to hold tissues apart.
The detailed movement analysis provided by the total distance and violin plots underscores the potential of using computer vision and AI-based tracking systems to enhance the understanding of instrument usage in surgery. This approach can inform training programs by identifying how instruments are used in practice, potentially leading to improved surgical techniques and better patient outcomes.
The current literature mirrors the reduced number of studies testing video instrument tracking. [18–20]
A few studies set up experiments for instrument tracking using electromagnetic tracking [21–24] [25, 26] while only few approaches utilised video tracking. [18–20]
The initial research on the possibility of incorporating spatiotemporal data on surgical instruments relied on either manual reporting of the instruments movements or sensory data collection, such as electromagnetic sensors that capture hand and surgical movement during open surgery[24, 26].
As stated by Genovese et al. [26] the need for more efficient training paradigms have expedited the introduction of video recording and motion analysis systems and virtual reality simulators. In comparison to placing the trackers on the instruments, Datta et al.[24] tried making use of markers located on specific body protuberances to collect information on hand movements. The application of this tracking method would not be feasible in sterile conditions. The sensor would have to be applied to a sterile pair of gloves with a second pair of sterile gloves on top. Sensor placement in this method is prone to a higher level of variability caused by the shifting, slipping, and readjustment of gloves during surgical procedures. Hence, a translation of this tracking method to open orthopaedic surgeries is limited in applicability.
Thus, the study of Genovese et al. [26] points out that instead of focusing on the human hand kinetics which holds higher variability, a focus on tool kinematics offers a standardised procedure with less inter-user variability. The study used electromagnetic sensors placed on the instruments rather than the hands, to track three-axis absolute orientation used for data processing. The findings were dependent on path length, number of movements and time to task completion.[26]
Cavaliere et al. [25] used planar body-mounted electromagnetic sensors to reduce the above mentioned inter-user variability in sensor placement and the resulting measurement. Even with fixed sensors Cavaliere et al. noted that their method proved to be too slow for real time tracking. They noted that they still needed an extensive system amelioration to compensate for metallic distortions from the environment.[25] Here it should be noted that installing body sensor could be difficult to implement in the field of orthopaedic surgeries. Most orthopaedic surgeries use intraoperative x-rays to visualise any fracture reduction or implants used. Thus, the patients and the surgeons have to be adequately protected by wearing lead aprons which might further scew or block the signals send from the body sensory.
While correctly stating some of the strength of electromagnetic tracking, such as small sensor size and no line-of-sight restrictions to take into account Lugez et al. [23] addressed another issue occurring with electromagnetic tracking. The study group reported that electromagnetic measurement accuracy decreased with insertion depth of the surgery as well as tracking at the edges.[23] With videotaping the hurdle of surgery depth and edge tracking could be eliminated. A visual tracking system could be mounted at various angles and zoom settings to ensure a wide scope of camera view.
Because of the high expenses, intricate integration processes of new hardware in the surgical theatre, researchers are now focusing on the use of computer vision algorithms for surgical video analysis [19, 27]. These techniques are remarkably more cost-effective and can operate with as little as a single camera video scoping the surgical field in theatre. By harnessing the power of computer vision models, these methods aim to enhance the safety and comprehensiveness of surgical care. Numerous potential applications of computer vision in surgery have been explored, including surgical skill assessment, surgical phase recognition, and the detection of surgical tools and hands[9] [11]. These investigations have highlighted the versatility and potential of computer vision algorithms in addressing various challenges encountered.
Ganni et al. [20] retrospectively analysed videos of laparoscopically performed cholecystectomies. It was noted that one of the challenges of video tracking is to ensure the instruments’ visibility at all time. Even with numerous cameras and angle optimisation a clear view and the tracking analysis can be impacted if the instrument moves out of the cameras field of view or is covered by tissue or fluids.
One more challenge is the high computational resources needed for training these intricate computer vision models and performing model inference. Training the computer vision models on non-personal data does not need to take place in healthcare facilities and can be performed at institutions with higher computational resources available, such as computational clusters. However, performing predictions in open surgery settings is more critical due to personal data regulations, which allow the processing of the footage on-premises only. To achieve this, high-performance edge devices like Graphic processing units (GPU) are needed, but they come with higher costs and require extra management for resource and data processing.
Certain limitations of the study should be mentioned and taken into consideration. The dataset used in this study entails a video sequence that documents a single surgical procedure, specifically a sagittal band reconstruction. The narrow focus of this dataset may pose limitations when extending the findings to more intricate, complex and lengthy surgical procedures. Complex surgeries may involve a broader array of instruments and a higher degree of variability in surgical techniques. Therefore, the effectiveness of the proposed computer vision-based approach in such settings remains uncertain. Another limitation is the number of instruments in the video. The study results are based on a video sequence from one procedure and the number of instruments utilized in this procedure do not reflect the sheer amount and range of surgical instrument utilized in extensive and highly specialised orthopaedic surgeries. Additionally, lengthy procedures might pose challenges related to camera angles and lighting conditions including shadows and specular reflections. Moreover the preparation necessary for the model to work involving a dataset of previously annotated images can be time-consuming and labour-intensive. The training of the model requires expertise in computer vision as well as surgical knowledge. An annotation inaccuracy can introduce bias and significantly impact the performance and reliability of the model and cause inaccuracy in the results. Lengthy procedures can also pose a challenge for the sheer load of data needing to be processes, stored and analysed after each video tracking session.
The resolution of these hurdles necessitates constant research and collaboration between computer vision specialists and medical professionals to certify the safety and efficacy of computer vision systems in surgical settings.
For the future, it can be postulated that a comprehensive understanding of both computer vision and orthopaedic surgery will be essential for creating multifaceted and useful applications in this field. Therefore, it is imperative to provide education in surgical data science. The goal of this training in data science is to equip healthcare professionals with the ability to incorporate AI into their decision-making processes, leading to enhanced diagnostic accuracy and treatment outcomes. It is equally important for data scientists and engineers to be exposed to clinical issues, allowing them to acquire a deeper understanding of the practical challenges and requirements in a medical context. This interdisciplinary approach fosters the development of more robust and clinically relevant AI solutions.
The here presented program will be further enhanced by integrating supplementary devices and assessing the effectiveness of our approach through multiple orthopaedic procedure videos. By doing so, our aim is to substantiate and improve the precision of our models, ensuring they are reliable and effective in a real-world clinical setting. The ongoing iterative process of model refinement and validation is crucial, as it allows for continuous enhancement of our technique. This approach not only improves the accuracy of AI-assisted surgical insights but also augments decision-making capabilities, ultimately leading to better patient outcomes.
Through iterative testing and feedback, our models can adapt to new data and scenarios, continuously learning and improving. This dynamic process can enhance the accuracy, precision, and efficiency of these tools in the field of orthopaedic surgery. By endorsing collaboration between healthcare professionals and data scientists, we can drive innovation and ensure AI technologies are seamlessly integrated into clinical practice. This integration has the potential to transform the landscape of surgical care and contribute to improved patient outcomes and the advancement of orthopaedic surgery.