Our study aimed first to investigate a range of variables relevant for a reliable comparison of the full-endoscopic versus microsurgical decompression in our institution; secondly, we sought to assess predictive factors associated with PROMs and third use these findings to compare the techniques’ ability to improve PROMs reliably. Our approach allowed us to overcome the lack of a randomized controlled trial design by using adequate case-matching of patients assessed retrospectively. Endoscopic treatment of lumbar spinal stenosis was similarly successful as the conventional microsurgical approach, although it was associated with higher complication rates in our single-center study experience. The distribution of complications indicated surgical learning curves to be the main factor of these findings.
FED and MSD provide equivalent PROM-improvement, but FED comprises higher complication rates
The results show that both techniques are comparable in improving PROMs without one showing signs of superiority with regards to PROMs as the outcome of interest. However, we observed more complications in the FED group. Notably, these complications occurred within the first n = 20 patients treated with the full-endoscopic technique in our hospital, and thus this high rate was most likely based on the surgeon's learning curve. Further, our analysis revealed a significant inverse relationship between the number of full-endoscopic surgeries and the operation time (rho=-0.4219; p = 0.0104), whereas there was no significant correlation for the microsurgical group. Further, we did not observe significant differences in operation time between the last 16 FED-surgeries and the MSD group. These findings indicate that the number of surgeries performed significantly lowered the complication rates and reduced the operation time. Our results are consistent with the learning curve assessment of Zelenkov et al., who reported that the plateau of the learning curve of full-endoscopic interlaminar and transforaminal surgery would be achieved within the first 20 patients [23]. As our surgeon had extensive practice in the microsurgical technique, we cannot currently define whether the full-endoscopic procedure can be generally classified as having a “steep learning curve” or a “shallow learning curve” according to the definition provided by Benzel et al. [24]. In the most extensive analysis of the learning curve in endoscopic decompression of lumbar spinal stenosis, Lee et al. showed that the complication rates were higher and operation times were longer in the first cohort of patients treated with FED [25]. After the 100th case, the plateau of the operation time was reached, translating to a rather steep learning curve [25]. In addition, the complication rates in the first cohort of patients were twice as high compared to the more experienced phase of the learning curve [25]. This might also explain the higher operation time in FED compared to MSD in our cohort. The first 20 FED patients were included in the statistics and generally showed higher OT than those treated hereafter. Other authors generally reported lower complication rates for FED compared to MSD [26–28]. However, no information was provided regarding the learning curves of the surgeons. To summarize, literature evidence and our findings indicate that complication rates might be higher for the FED group in the first cohort of patients, but are likely to become lower versus MSD after the plateau of the learning curve has been reached for FED.
Further considerations and perspectives
In contrast to Marković et al., we did not find that the full-endoscopic technique has better outcomes in pain and disability scales [29]. Nevertheless, their data were obtained over a 3-years period, and we cannot rule out the possibility that PROMs will become different after 1-year examinations. Thus, a long-term evaluation is warranted using either a case-matched design or a randomized controlled trial. Consequently, we are currently conducting a comprehensive cohort trial utilizing a broad range of relevant outcomes to overcome this lack of evidence (German Clinical Trials Register (DRKS): DRKS-ID: DRKS00025786). In addition, we did not focus on other relevant factors which might influence the implementation of the technique in hospitals, such as cost analysis comparisons. In a previous report, cost analysis comparisons between MSD and FED revealed that both procedures had similar costs in hospitalization, radiology, and follow-up visits. Although costs for FED were 5.7% higher for the unit to run the operations, MSD was 28.1% more expensive than FED when comparing complication rates, which were 3.8% for FED and 7.5% for MSD [26].
The full-endoscopic technique to treat lumbar spinal stenosis is in advance. Scarring of the epidural space, the route of access potentially leading to instability of the coordination system, and the generally larger amount of soft-tissue resection might justify the shift towards a more tissue-sparing technique [27]. Constant technical advantages regarding the visualization of the operation situs utilizing modern optics might allow better progress for the FED than the MSD, probably affecting future outcome evaluations. Furthermore, the broad application of cadaver courses might improve and enhance the complicated learning curve [30]. Especially the fact that surgeons in Asia report higher self-reported skill levels and that endoscopic spine surgery training in Asia is reported to be better implemented in the daily practice of spine surgeons might be the reason for the tendency of better outcomes results for the FED compared to MSD in Asian publications [30]. Interestingly, reports from non-Asian countries generally include more comparable results for the FED vs. MSD [27, 31–43], compared to Asian country publications [44, 45], which seem to favor FED. However, a future meta-analysis using the publication region as the confounding variable in the meta-regression and subgroup analysis model is warranted to provide an in-depth analysis of this phenomenon. In accordance with Chen et al., we did not find a general tendency of the assessed variables to affect all PROMs similarly [46]. However, they determined alcohol use to be associated with higher re-operation rates. Unfortunately, we could not include this variable as there was no sufficient data available. In contrast, we additionally assessed laboratory markers and radiological markers compared to Chen et al. We found that preoperative CRP levels and a high Schizas score were associated with worse PROMs, particularly for the ODI. Notably, the fact that preoperative CRP levels influenced the PROMs might be of relevance and requires future exploration in prospective studies. The neural network model applied by us did confirm the relevance of several predictors for PROMs. However, the relative errors were not satisfying, probably due to the limited sample size. Furthermore, there might be an information loss when radiological images are measured, and the data are fed into a machine learning model compared to an approach where the radiological images are directly combined with the clinical data (multi-input, mixed-data model). We are currently collecting prospective data to feed and train a multi-input, mixed-data neural network model and predict spine surgery patients' PROMs based on a combination of radiological, clinical, and laboratory predictors. This will allow evaluating whether the patient could benefit from surgery and which surgical approach could be better suited for each patient based on the patient’s individual data.
Strengths and Limitations
Our study is associated with certain strengths and limitations. One of the main strengths is the extraction of several confounding variables which potentially affect PROMs. Considering these variables in our regression model allowed a more precise estimation of regression coefficients than existent in studies to date on this topic. Furthermore, we applied a propensity score matching based on this finding, one of the state-of-the-art techniques to maintain comparability between groups in non-randomized study designs [22]. Another advantage is the consideration of the surgeon's learning curve in our results and interpretations as we included all FED patients since inception. Therefore, the results are especially interesting for clinicians who want to apply this technique in their institutions and are interested in relevant outcomes in the “implementation phase”. Study limitations include the retrospective design, which has several disadvantages, such as the necessity to apply multiple statistical models to allow comparability, which themselves can introduce some susceptibility to error. Furthermore, this study type is prone to selection bias and misclassification bias, as we had to use the data as provided in our patient information system without further validation, and additional consultations to extract missing data in some variables are often not possible. Alterations in CRP levels are known to be associated with surgical trauma, and peaks are usually observed after 48 hours postoperatively [47, 48]. Nevertheless, the CRP alterations as a response to the iatrogenic traumatic injury are highly variable and dependent on numerous patient characteristics [47] and may even be absent in some patients [49]. Due to the retrospective design, we cannot validate whether other patient characteristics might have affected the CRP findings as no randomization was performed. Thus, comparisons of CRP levels between studies might be limited and outcome interpretations and comparisons of the techniques should rely more on the PROMs than on surrogate markers such as CRP. Nevertheless, they can help to identify adverse outcomes, and CRP levels can be used as one quantifying parameter for the degree of surgical trauma [50], although there are also controversial statements in this regard [51].
Furthermore, we could not include other variables which could be relevant such as interleukin-6 as a surrogate marker for tissue damage, as these are not regularly measured and thus not available in the patient information system. Therefore, a large-scale prospective cohort focusing on a broad range of relevant outcomes is warranted to improve the current knowledge.