An independent inter- and intra-observer agreement assessment of Yeom classification for bone cement leakage following vertebroplasty/kyphoplasty

Abstract Study Design An inter- and intra-observer agreement study Background In recent years, vertebroplasty and kyphoplasty have been widely used in treating osteoporotic vertebral compression fractures (OVCF) though the clinical efficacy of them is still controversial. However, there are also inevitable complications, first and foremost is bone cement leakage (BCL). Yeom classification is commonly used to evaluate BCL. The objective of this study is to assess its reliability and reproducibility, and to explore its clinical application value. Methods All 58 patients with BCL following vertebroplasty/kyphoplasty were involved. Six spine surgeons were selected to be evaluators as they were unaware of the identity of the patients and the treatment they received. They classified BCL according to Yeom system, we used kappa (K) to assess the inter- and intra-observer agreement. After 12 weeks, we repeated the analysis. Results The inter-observer reliability of Yeom classification was substantial with K value of 0.71 (1st assessment) and 0.73 (2nd assessment). The intra-observer reproducibility of Yeom classification was near perfect with K value of 0.88. Conclusion Yeom classification system has substantial inter-observer reliability and near perfect intra-observer reproducibility in BCL following vertebroplasty/kyphoplasty, which can be widely used in clinical care as an appropriate instrument for early observation, mechanism and severity cognition, and prognosis predicting of BCL. Besides, the adding of type M (the mixed type) may improve the classification.


Introduction
Osteoporotic vertebral compression fractures (OVCF) can cause back pain and even spinal deformity in the elderly, which seriously affects the life quality of patients. Traditional open surgery is always accompanied with surgical trauma, and the loosening of internal fixation due to osteoporosis often lead to the failure of surgery. 1 In 1987, French scholars Galibert et al. 2 first reported the successful application of vertebroplasty in treating the patient with long-term pain of C2 cavernous hemangioma. In 1990, Deramond et al. 3 applied vertebroplasty to OVCF, and obtained satisfactory analgesic effect. In 1994, American scholars Garfin et al. 4 proposed kyphoplasty on the basis of vertebroplasty to correct kyphosis through balloon expansion. In 2001, the study by Lieberman et al. 5 indicated that applying kyphoplasty to OVCF could also achieve significant analgesic effect. Nowadays, the clinical efficacy of vertebroplasty and kyphoplasty is controversial. High-quality evidences from Kallmes and Buchbinder suggest that this is not an effective treatment. 6,7 However, it is still widely used and does have a role, e.g. augmentation when fixation is required, some tumors, etc. It should be noted that there are inevitable complications, first and foremost is bone cement leakage (BCL), 8 which is mainly manifested in the heat injury during the solidification of bone cement and the vascular embolism, mechanical compression and vertebral biomechanical changes caused by the space occupying due to the solidification. A small amount of leakage may not lead to clinical symptoms and treatment is unnecessary, but extensive leakage will cause fatal consequences such as pulmonary embolism and paraplegia. 9,10 Therefore, precise evaluation and accurate judgment of BCL is essential. It is of great importance to formulate an appropriate and comprehensive classification system aid to assess BCL.
In 2003, Yeom et al. 11 proposed a classification system based on channels of leakage which divided BCL into three types as follows: type B (leakage via the basivertebral vein), type S (leakage via the segmental vein) and type C (leakage through a cortical defect), indicating the anatomical localization and the cause of BCL, and pointed out that it is very important to reduce the BCL via venous channel due to the significantly higher incidence of type B and type S. So far, there is no unified classification standard for BCL, and Yeom classification is most commonly used ( Figure 1).
An adequate and rational classification system should not only standardize research terminology, but also allow communication and easier consultation among specialists. However, the mentioned classification have not been assessed despite it is widely used in clinical work and in literature, 12 and still require independent validation. The aim of our study is to analyze the inter-and intra-observer agreement of Yeom classification system identifying and categorizing BCL, and evaluate its clinical application value.

Patient case selection and evaluation
Institutional review board approval was obtained from our ethics committee to conduct this study. Patients treated for BCL following vertebroplasty/kyphoplasty in database records of our hospitals from 2015 to 2019 were included and analyzed retrospectively. All participant subjects signed the informed consent for inclusion and were required to have complete imaging examinations and available clinical data. Complete imaging studies should include anterior-posterior (AP) and lateral radiographs, and computed tomography (CT) scans, which were used to detect and diagnose BCL. CT scans were performed through EMOTIO dual-slice scanners (Siemens Medical, Germany) with 1mm of reconstruction increment. Available clinical data included demographic characteristics, chief complaints, and treatment history. Based on the criteria, we finally selected a total of 58 BCL cases (19.3%) from 301 patients who underwent vertebroplasty/kyphoplasty. Two residents of our department who did not participate in the later statistics and analysis collected these cases from the database. Six spine surgeons were selected to be evaluators as they were unaware of the identity of the patients and the treatment they received. It's worth mentioning that none of the 6 spine surgeons had treated any of those patients, so as to avoid the recall of the imaging studies. To perform a sufficiently reliable study, we provided essential original literature and pertinent information for each evaluator to assess those cases according to Yeom classification system. Face-to-face meetings and evaluation sessions were performed before the agreement study and through which any controversies about the system were discussed until all the evaluators came to a consensus. Complete imaging studies and standard imaging reports were provided for evaluators as reference in order to avoid image selection bias. On the basis of Yeom classification, the 6 evaluators assigned each case with a BCL type ( Figure 2).
We evaluated inter-observer agreement by comparing the initial responses of the 6 evaluators and determined intra-observer agreement through comparing the two successive responses to the same case of each evaluator, which were separated by a 12-week interval, and the cases were displayed randomly so as to minimize the recall bias.

Statistical analysis
Stata (software for statistics and data science, version 16.0) was used to analyze the collected data. Considering that the classification of BCL belonged to ordinal data, kappa (K) coefficient was used to assess inter-and intra-observer agreement for Yeom classification system (two-way mixed effect model, in which people effects are random, and measures effects are fixed), 13 and we expressed K value with a 95% confidence interval (CI). For each type of Yeom classification, Fleiss's K was used to evaluate interobserver reliability, while intra-observer reproducibility was assessed by Cohen's K. 14,15 The range of K value was (À 1, 1). Higher values signified better agreement. According to the recommendations of Fleiss 16 and Landis et al., 17 there were 5 agreement levels of K, with K value of 0.00-0.20 indicating slight agreement, 0.21-0.40 fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 substantial agreement; and 0.81-1.00 near perfect agreement (Table 1).

Result
From 301 patients who underwent vertebroplasty/kyphoplasty, our study involved 58 BCL cases (19.3%), including 18 males and 40 females with a mean age of 73.9 years (range from 62 to 87 years). There were totally 82 vertebrae which had BCL in these individuals, the range of diseased vertebrae covered from thoracic (47) to lumbar (35) ( Table 2). We obtained 492 evaluations from 6 evaluators in the first assessment, and another record of 492 evaluations were acquired after a 12-week interval.

Inter-observer reliability
Applying Yeom classification system, the first reliability analysis among 6 evaluators yielded an overall K value of 0.71, which indicated substantial inter-observer agreement. While they were compared with each other, analysis yielded K values of 0.65, 0.82, There were also significant differences between these results (p < 0.01) ( Table 4).

Intra-observer reproducibility
Reproducibility analysis of the same evaluator's results produced an overall K value of 0.88, which showed that the intra-observer agreement of Yeom classification system was near perfect. The K values of each evaluator were as follows: 0.84, 0.91, 0.87, 0.86, 0.93 and 0.90. These values were all considered near perfect agreement. There were significant differences between these results (p < 0.01) ( Table 5).

Discussion
In the past decades, since advantages of minimal invasion, rapid and efficient pain relief which are favored by spine surgeons and patients, vertebroplasty/kyphoplasty is widely used in spine surgery and has become one of the most effective methods to treat OVCF. However, with the popularization of the application, its complications have been gradually reported, in which BCL is the most common complication. According to relevant study, the incidence of BCL ranges from 19% to 65%, [18][19][20] and 73% of all complications are related to BCL. 21 Most BCL do not cause clinical symptoms. Ryu et al. 22 reported that the incidence of symptomatic complications was relatively low, which was closely connected to the primary disease of patients; Though BCL into paravertebral soft tissue is frequent, it rarely causes clinical symptoms. However, leakage into muscle is an exception, it often causes pain due to muscle contraction; 23 BCL into adjacent inter-vertebral disc is also common and often asymptomatic, but study by Lin et al. 24 indicated that the risk of adjacent vertebral fractures would be increased when BCL entered inter-vertebral disc; The incidence of BCL into epidural or neural foramina may reach 36.5% 23 and will result in neurological dysfunction or even paralysis due to compression of spinal cord caused by BCL, 25,26 but most cases are asymptomatic; BCL into paravertebral vein plexus can lead to pulmonary embolism, intracardiac embolism and even death, but not all of them are symptomatic. 27,28 Vertebroplasty/kyphoplasty can relieve pain and stabilize vertebrae. Meanwhile, it is important to reduce BCL and manage relevant complications. Classifying BCL following vertebroplasty/ kyphoplasty will make it easy for physicians to analyze those cases and thus prevent or reduce the occurrence of BCL in future clinic work. So far, there is no unified classification standard for BCL though various classification systems emerge in an endless stream.
Based on the location of BCL, Ni WF et al. 29 proposed a classification which divided BCL into six types: perivertebral leakage, spinal canal leakage, intraforaminal leakage, intradiscal leakage, paravertebral soft tissues leakage, and mixed type of leakage. Similarly, in the study by Wang et al., 30 BCL was classified into 5 types: type A, through a cortical defect into the paraspinal soft tissues; type B, through the basivertebral foramen; type C, via the needle channel; type D, through a cortical defect into the disc space; and type E, via the paravertebral vein. In addition, according to the shape, Qi et al. 31 classified BCL into linear leakage and strip leakage. These classifications are simple, convenient, intuitive, comprehensible and with great maneuverability. However, the obvious disadvantage is that they overemphasize the representational form of BCL, and cannot reveal the mechanism and patterns of BCL.
The establishment of Yeom's system not only performs the classification based on channels of leakage, that is the appearance of BCL, but also provides the mechanism and patterns, through which can make it easy for spine surgeons to understand relevant risk factors (the viscosity of bone cement is an independent risk factor of type B BCL; the severity and type of fracture were independent risk factors of type S BCL; the severity of fracture and MRI findings of vertebral fissure were independent risk factors for type C BCL), 32 early detect BCL in operating theatre and predict the prognosis of patients. This classification analyzes the form of BCL in spinal canal, and points out that the cement distribution of type B in spinal canal is uniform and less in quantity, the space occupied will not exceed the anterior 1/3 of the  T7  1  T8  2  T9  2  T10  6  T11  12  T12  23  L1  22  L2  10  L3  3  L4  1  4 Vertebroplasty/Kyphoplasty 43/15  spinal canal, and patients with neurological symptoms are fewer. Meanwhile, the type C BCL into spinal canal through cortical defects is often limited to one side, which is easier to cause spinal cord compression and neurological symptoms. The results showed that the twice inter-observer agreement K value were 0.71 and 0.73 respectively, with 'substantial' consistency strength, and that of intra-observer agreement was 0.88 with 'near perfect' consistency strength, all of which indicated the great reliability and reproducibility of Yeom classification system. Though its significant advantages and agreement, it must be emphasized that Yeom classification can only be regarded as an auxiliary tool for early observation, mechanism and severity understanding, and prognosis predicting of BCL, it doesn't help with treatment decision-making.
According to our results, similarly to the study by Yeom et al., 11 we found that the misinterpretations of BCL type among the 6 evaluator were mainly the confusion between type C and type B, or type C and type S. Through analysis, we believed that if the cortical defect in type C was not obvious and located in the posterior part of the vertebral body, it would be easily confused with type B when BCL through the defect entered the vertebral canal, while another circumstance was that type C would be easily confused with type S when BCL through the defect was located at the edge of the vertebral body or near the pedicle. Another significant problem was that the BCL didn't manifest typically as one type of the classification, they might manifest as mixed types, thus it would be difficult for evaluators to assign a single type for BCL, which finally lead to the differences between their choices. Therefore, although there is a high consistency in judging the type of BCL according to Yeom classification when combining CT with radiograph, misinterpretations exist, which still need to be improved in clinical practice. In this regard, we believe that adding a type M (the mixed type) may make Yeom classification more comprehensive and reliable.
The current study has limitations which could be altered in some ways to better ascertain the inter-and intra-observer error of Yeom classification system. A limitation of this study is its relatively small sample size. Though the number of patients included in our study is more than that of Yeom et al., 11 further expanding our sample population will allow for more meaningful statistical testing on reliability and reproducibility of these parameters. Another limitation is the recall bias from evaluators, namely the deviation of results for repeated assessments, which has been mentioned by Wang YX et al., 33 indicating paired assessments should be performed ideally in a short period of time to minimize the deviation. In our study, 12-weeks' interval might be relatively long. Finally, the seniority and clinical experience of physicians who participated in our study was an important factor. We chose 6 evaluators who had never treated the participant subjects, so as to prevent them from recalling the imaging studies associated with these patients. As a result, the credentials of the 6 spine surgeons were relatively low, and that might affect the accuracy of classification. So, it may be valuable to repeat this study to explore if higher skill level of spine surgeons will cause a better agreement than that assessed by novice or intermediate physicians. In addition, it is also worth mentioning that there are differences between the anatomical structure of thoracic and lumbar vertebrae, we can compare the agreement of thoracic vertebrae with lumbar vertebrae using Yeom classification in our later study. Therefore, high-quality, large sample, and multicenter studies should be performed in our future clinical work to provide spine surgeons with the best evidence-based information.

Conclusion
Yeom classification system has substantial inter-observer reliability and near perfect intra-observer reproducibility in BCL following vertebroplasty/kyphoplasty, which can be widely used in clinical care as an appropriate instrument for early observation, mechanism and severity cognition, and prognosis predicting of BCL. However, there is still misinterpretations and inconsistency, the adding of type M (the mixed type) may improve the classification, and we still need more future prospective studies to evaluate whether Yeom classification system allows better understanding and prevention of BCL in individual patients.