DOI: https://doi.org/10.21203/rs.3.rs-107456/v1
Neck pain, sensory disturbance and motor dysfunction in most patients suffered cervical spondylotic myelopathy (CSM). For CSM surgery, it is necessary to evaluate preoperative inter-vertebral disc degeneration (IDD) which determines whether to adopt fusion strategy, and postoperative IDD which is one of the main reasons for reoperation. Modified Pfirrmann grading system is commonly used to evaluate IDD. The objective of this study is to evaluate its reliability and reproducibility on cervical IDD in CSM patients, and to explore its clinical application value.
All 165 patients with CSM were enrolled. 6 physicians (3 spine surgeons and 3 radiologists) who have certain clinical experience were selected. They graded cervical inter-vertebral disc according to modified Pfirrmann grading system, we used intra-class correlation coefficient (ICC) and weighted kappa (wκ) to assess the inter- and intra-observer agreement. After 12 weeks, we repeated the analysis.
The inter-observer reliability of modified Pfirrmann grading system was excellent with ICC value of 0.76 and near perfect with wκ value of 0.82. The intra-observer reproducibility of modified Pfirrmann grading system was excellent with ICC values ranging from 0.80–0.91, and near perfect with wκ values ranging from 0.83–0.92.
Modified Pfirrmann grading system has excellent inter-observer reliability and intra-observer reproducibility on cervical IDD in CSM. In addition, it indicates a good appliance among spine surgeons and radiologists, clinical and radiological studies applying it should be deemed accurate. Thus, modified Pfirrmann grading system can be widely used as an appropriate instrument in clinical care.
Cervical spondylotic myelopathy (CSM) is the most serious type of cervical spondylosis. It refers to a group of comprehensive symptoms such as neck pain, limb dysfunction and even paralysis, which is caused by compression of spinal cord due to degeneration of cervical vertebra, inter-vertebral disc, surrounding ligament and other soft tissues.[1, 2] The proportion of hospitalized patients with CSM is 4.04/100000 per year, which has doubled in the past 10 years, and the number of patients undergoing surgery is more than 7 times higher each year.[3]
For CSM surgery, the choice of operation method is the result of comprehensive evaluation. In addition to the detailed understanding of cervical spine stability, operative segment, and severity of spinal stenosis and spinal cord compression, it is also necessary to evaluate operative and adjacent cervical degeneration, so as to make a more detailed operation plan.[4, 5] The preoperative degeneration, especially inter-vertebral disc degeneration (IDD), determines whether to adopt fusion strategy, while the postoperative degeneration of adjacent segment is one of the main reasons for reoperation. In the past clinical work, due to the lack of a clear definition of IDD, there is few reliable criteria to determine the surgical strategy of CSM patients, and to judge the prognosis and risk factors of the operation. It is of great importance to establish an comprehensive and rational grading system based on modern imaging examinations for IDD. Magnetic resonance imaging (MRI) is considered to be the best imaging tool for the evaluation of IDD, for it shows both disc morphology and hydration.
In 2007, Griffith et al.[6] proposed the modified Pfirrmann grading system which classified lumber inter-vertebral discs into 8 grades, in order to make a qualitative research of IDD (Table 1, Fig. 1). The 8 grades represent a progression from normal disc to severe disc degeneration. Grade 1 corresponds to no disc degeneration while Grade 8 corresponds to end-stage degeneration.
Grade | Signal From Nucleus and Inner Fibers of Anulus* | Distinction Between Inner and Outer Fibers of Anulus at Posterior Aspect of Disc | Height of Disc |
---|---|---|---|
1 | Uniformly hyperintense, equal to CSF | Distinct | Normal |
2 | Hyperintense (༞presacral fat and༜CSF) ± hypointense intranuclear cleft | Distinct | Normal |
3 | Hyperintense though༜presacral fat | Distinct | Normal |
4 | Mildly hyperintense (slightly༞outer fibers of anulus) | Indistinct | Normal |
5 | Hypointense (=outer fibers of anulus) | Indistinct | Normal |
6 | Hypointense | Indistinct | ༜30% reduction |
7 | Hypointense | Indistinct | 30–60% reduction |
8 | Hypointense | Indistinct | ༞60% reduction |
*Grades 1, 2, and 3 are based on the signal intensity of the nucleus and inner fibers of anulus. For Grade 4, the margins between the inner and other fibers of the anulus at the posterior margin of the disc are indistinct. For Grade 5, the disc is uniformly hypointense, although there is no loss of disc space height. For Grades, 6, 7, and 8, there is progressive loss of disc space height. These could be broadly classified as mild, moderate, to severe loss of disc space height. Very occasionally, although obvious disc collapse is present, hyperintense signal from the nucleus and inner fibers of the anulus is preserved. This is referred to by a double entry, e.g., 4/7, with the former reporting the disc signal and the latter the degree of collapse. |
For IDD in CSM, an adequate and rational grading system would standardize research terminology, allow easier communication among physicians and help to determine surgical strategy in individual patients. However, there was no study evaluating the modified Pfirrmann grading system and its application in cervical IDD, it still require independent validation. In view of the frequent need of MRI studies categorizing IDD in the evaluation of patients with CSM, our study aims to analyze the inter-observer reliability and intra-observer reproducibility of modified Pfirrmann grading system. Besides, this will be the first study assessing its application value in cervical IDD.
This study was performed in accordance with the Declaration of Helsinki, and institutional review board approval was obtained from our ethics committee with informed signed consent being provided by all participating subjects. Database records of patients with CSM admitted to Shanghai Longhua Hospital from 2018 to 2019 were retrospectively collected and analyzed. spine Cervical MRI examinations and available clinical data were required for inclusion. Cases with concurrence of cervical spine fracture, tumor, infection, or presence of instrumentation in the cervical spine were excluded. A complete MRI examination must include T2-weighted turbo spin sagittal images without fat suppression to cover all types of modified Pfirrmann grading system. MRI data were gained through 1.5-T whole-body imaging system. The complete and available clinical data included demographic characteristics, complaints, spinal cord and neurological function, concomitant diseases, and treatment history.
One resident of our department who did not participate in the later statistics and analysis collected the cases from our database of patients. Meanwhile, it was essential that physicians who treated the patients could not act as assessors. Six physicians from two specialties: three spine surgeons and three radiologists volunteered to be evaluators while they did not know the identity of the patient, the treatment they received, and the original classification used in clinical care. In order to conduct a sufficiently reliable study, each evaluator was provided necessary original literature and relevant information to evaluate cases on the basis of modified Pfirrmann grading system. Any different opinions about the system were discussed before performing the assessment through face-to-face meetings until all the evaluators came to a consensus. Standard image reports were available to evaluators as reference. According to modified Pfirrmann grading system, the six evaluators respectively assigned each cervical inter-vertebral disc with a single grade (from C2–C3 to C7–T1).
Inter-observer reliability was evaluated by comparing the initial responses of all the six evaluators. The assessment of intra-observer reproducibility was performed through comparing the same evaluator’s two responses of the same case with an interval of 12 weeks, and the cases were presented in a random order to minimize the recall bias.
All data analyses were performed using Statistical Packages of Social Sciences (SPSS) software (version 22.0). Considering the grading of all discs belonged to ordinal data, we adopted intra-class correlation coefficient (ICC) and weighted kappa (wκ) to measure inter- and intra-observer agreement for modified Pfirrmann grading system (two-way mixed effect model, in which people effects are random, and measures effects are fixed).[7] ICC allows to analyze the corresponding data when the observer agreement varies with multiple responses, while wκ makes it possible to assess agreement when not all disagreements are equally significant. Besides, we expressed ICC values with a 95% confidence interval (CI). For each grade of modified Pfirrmann grading system, Fleiss’s κ was used to assess inter-observer reliability, and intra-observer reproducibility was measured by Cohen’s κ.[8, 9] The range of ICC value is (0,1) while that of κ value is (− 1, 1). The larger the value is, the better the agreement is. Based on the recommendations of Fleiss[10] and Landis et al.,[11] there were three levels of ICC, with ICC values 0.00 to 0.40 considered poor agreement, 0.40 to 0.74 fair to good agreement, and 0.75 to 1.00 excellent agreement, while levels of agreement for κ were divided into five grades, with κ values 0.00 to 0.20 considered slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 substantial agreement; and 0.81 to 1.00 near perfect agreement (Table 2).
ICC / Level of Agreement | κ / Level of Agreement |
---|---|
0.00-0.40 / Poor | 0.00-0.20 / Slight |
0.40–0.74 / Fair to good | 0.21–0.40 / Fair |
0.75-1.00 / Excellent | 0.41–0.60 / Moderate |
/ | 0.61–0.80 / Substantial |
/ | 0.81-1.00 / Near perfect |
Intra-class correlation coefficient (ICC) and kappa coefficient (κ value) are used for consistency test, which rare indexes to measure the accuracy of classification. |
According to the exclusion criteria, a total of 165 consecutive cases from our database of patients since 2018–2019 were involved in this study, including 94 males and 71 females with an average age of 63.5 ± 2.4 years (range from 43 to 85 years) (Table 3). There were 990 cervical inter-vertebral discs altogether in these individuals, and for one assessment, we finally obtained 5940 records since each disc was evaluated by six evaluators. After 12 weeks, we acquired another record of 5940 evaluations.
No. | Content | Number |
---|---|---|
1 2 | Sex (Male/Female) Age (mean) | 94 / 71 63.5 ± 2.4 |
3 | Complaints Neck pain | 86 |
Sensory dysfunction | 160 | |
Motor dysfunction | 154 | |
4 | Sensory dysfunction Upper extremity Lower extremity Trunk | 146 93 80 |
5 | Motor dysfunction Upper extremity Lower extremity | 128 133 |
6 | Bladder function Normal Abnormal | 91 74 |
7 | Surgical / non-surgical treatment history | 2 / 163 |
In the first assessment using modified Pfirrmann grading system, all discs were classified into Grade 1 (6 discs), Grade 2 (30 discs), Grade 3 (1346 discs), Grade 4 (1178 discs), Grade 5 (1602 discs), Grade 6 (1123 discs), Grade 7 (415 discs), and Grade 8 (240 discs), while the 12-weeks-later assessment was as follows: Grade 1 (4 discs), Grade 2 (19 discs), Grade 3 (1357 discs), Grades 4 (1285 discs), Grades 5 (1498 discs), Grades 6 (997 discs), Grades 7 (488 discs), and Grades 8 (292 discs) (Table 4).
The First Assessment | The Second Assessment | |
---|---|---|
Grade | Number / Proportion | Number / Proportion |
1 | 6 / 0.1% | 4 / 0.1% |
2 | 30 / 0.5% | 19 / 0.3% |
3 | 1346 / 22.7% | 1357 / 22.8% |
4 | 1178 / 19.8% | 1285 / 21.6% |
5 | 1602 / 27.0% | 1498 / 25.2% |
6 | 1123 / 18.9% | 997 / 16.8% |
7 | 415 / 7.0% | 488 / 8.2% |
8 | 240 / 4.0% | 292 / 4.9% |
Based on reliability analysis of the results among the six evaluators, the overall inter-observer agreement of modified Pfirrmann grading system using ICC and wκ were respectively excellent and near perfect. The ICC value was 0.76 [95% CI, (0.74, 0.78)] while the wκ value was 0.82 [95% CI, (0.78, 0.86)]. And the inter-observer agreement of six cervical discs: C2/3, C3/4, C4/5, C5/6, C6/7 and C7/T1 were mostly good (including fair to good, excellent, substantial, and near perfect), which indicated no significant difference in agreement evaluation of various cervical discs (Table 5).
Evaluations | ICC / Level of Agreement | wκ / Level of Agreement | |
---|---|---|---|
All discs | 5940 | 0.76 / Excellent | 0.82 / Near perfect |
C2/3 | 990 | 0.68 / Fair to good | 0.77 / Substantial |
C3/4 | 990 | 0.79 / Excellent | 0.83 / Near perfect |
C4/5 | 990 | 0.84 / Excellent | 0.90 / Near perfect |
C5/6 | 990 | 0.72 / Fair to good | 0.74 / Substantial |
C6/7 | 990 | 0.85 / Excellent | 0.85 / Near perfect |
C7/T1 | 990 | 0.74 / Fair to good | 0.87 / Near perfect |
We compared the agreement evaluation of spine surgeons with that of radiologists, and found no significant difference between the two specialties [ICC = 0.85 (0.79–0.91), wκ = 0.77 (0.74–0.80)]. In addition, each specialty had excellent or near perfect agreement [spine surgeon: ICC = 0.88 (0.86–0.90), wκ = 0.90 (0.87–0.93); radiologists: ICC = 0.78 (0.74–0.82), wκ = 0.86 (0.81–0.91)] (Table 6).
Evaluations | Intra-specialty comparison | ||
---|---|---|---|
ICC / Level of Agreement | wκ / Level of Agreement | ||
Spine surgeons | 2970 | 0.88 / Excellent | 0.90 / Near perfect |
Radiologists | 2970 | 0.78 / Excellent | 0.86 / Near perfect |
Inter-specialty comparison | |||
ICC / Level of Agreement | 0.85 / Excellent | / | / |
wκ / Level of Agreement | 0.77 / Substantial |
Similar to the first assessment, the repeated assessment after 12 weeks indicated that the inter-observer agreement of modified Pfirrmann grading system was also excellent or near perfect [ICC = 0.79 (0.77, 0.81), wκ = 0.81 (0.77, 0.85)]. The following reproducibility analysis of the same evaluator's results showed excellent intra-observer agreement, with all ICC and wκ values higher than 0.80. Besides, the intra-observer agreement based on disc level was excellent as well, which showed no difference in agreement evaluation of various cervical discs (Table 7).
Evaluator∗ | ICC / Level of Agreement | wκ / Level of Agreement |
---|---|---|
A | 0.89 / Excellent | 0.90 / Near perfect |
B | 0.86 / Excellent | 0.84 / Near perfect |
C | 0.80 / Excellent | 0.83 / Near perfect |
D | 0.83 / Excellent | 0.84 / Near perfect |
E | 0.91 / Excellent | 0.89 / Near perfect |
F | 0.86 / Excellent | 0.92 / Near perfect |
Overall | 0.84 / Excellent | 0.87 / Near perfect |
∗A, B, C, D, E, F represent the 6 evaluators who participated in the study. |
In recent years, the incidence of CSM has significantly increased.[12] Affected by environmental factors and growth of age, IDD, cervical small joint degeneration and formation of vertebral marginal osteophyte may cause spinal stenosis, chronic compression of the spinal cord, leading to neck pain, motor dysfunction and even paralysis.
At present, it is accepted that the pathological process of CSM mainly includes static, dynamic and ischemic mechanisms.[13, 14, 15] Cervical IDD is considered to be the trigger of static mechanism. It will lead to changes in the biomechanics of cervical spine, which may induce the formation of spur in the vertebral endplate. Meanwhile, the herniation of degenerative discs will squeeze the ligamentum flavum and make it penetrate into the spinal canal, causing spinal canal stenosis. When the structure of cervical spine is abnormal due to the static mechanism, the flexion and extension of cervical spine will precipitate the irreversible damage of the spinal cord.[16] If the cervical instability caused by IDD occurs in the motion segment, it will result in dynamic compression on spinal cord, along with the progress of pathological process, the stability and joint degeneration of this segment will gradually deteriorate, and the spinal canal will become increasingly narrower.[13] Briefly, IDD plays a leading role in the pathogenesis of CSM.
A recent study shows that non-surgical treatment is not suitable for moderate and severe CSM, for there is no evidence indicating non-surgical treatment can effectively inhibit or reverse the natural history of CSM, and the progression of the disease will bring serious consequence, such as deterioration in the quality of life, significant dysfunction and adverse impact on surgery efficacy, while the risk of secondary spinal cord injury or central syndrome is higher. Therefore, it is generally believed that once CSM is diagnosed, surgery should be performed as early as possible.[17] As mentioned in preceding part of the text, preoperative IDD largely determines surgical strategy, while postoperative IDD plays a decisive role in prognosis. With the continuous development of relevant grading system, treatment concepts and techniques, surgical decision-making of CSM have been further improved, providing important clinical value for standardized treatment.
In 2001, Pfirrmann et al.[18] developed the most well-known grading system based on MRI, dividing IDD into five grades according to disc signal intensity, disc structure, distinction between nucleus and anulus, and disc height. Though this classification has been widely accepted and proved to have excellent inter- and intra-observer agreement,[19] study[6] found that it did not demonstrate discriminatory when applied to evaluate IDD in the elderly spine, besides, on the basis of images and descriptions provided, there were ambiguities in grading IDD as one level or another. To address these deficiencies, Griffith et al.[6] proposed a modified Pfirrmann classification which increased the 5 grades to 8 (Table 1, Fig. 1), so as to improve its discriminatory power when evaluating the elderly spine and minimize ambiguity when selecting grades.
The establishment of modified Pfirrmann grading system not only gives spine surgeons a clear definition of IDD, but also provides an ideal treatment plan prediction of prognosis for patients with CSM. At present, JOA[20] and NDI[21] scoring systems are the most commonly used criteria to evaluate the treatment of patients with CSM, in particular, JOA system can divide CSM into three levels, mild, moderate and severe according to the score, in order to help physicians determine whether patients need surgery as soon as possible. It is worth mentioning that both scoring systems focus on patients, especially their functional status, but neither JOA nor NDI scores lay emphasis on the cervical spine, no matter vertebra, inter-vertebral disc, spinal canal or spinal cord. Hence the advantage of modified Pfirrmann grading system is obvious.
However, it must be emphasized that the cervical IDD is only one consideration for surgery. Other important related factors include: non-surgical treatment, cervical spine stability, operative segment, severity of spinal stenosis and spinal cord compression, prognosis, etc. Thus, modified Pfirrmann grading system can only be regarded as an important reference for surgery, and the most ideal treatment scheme can be formulated by combining JOA and NDI scores.
The results show that the twice inter-observer agreement (ICC: 0.76, 0.79; wκ: 0.82, 0.81) of modified Pfirrmann grading system are slightly higher than that reported by Griffith et al.[6], while the intra-observer agreement in this study is excellent ((ICC: 0.84; wκ: 0.87), similar to that of Griffith et al.[6], indicating that modified Pfirrmann grading system has a very good consistency. It is noteworthy that the evaluators involved in establishing modified Pfirrmann grading system were all radiologist (two musculoskeletal radiologists and a general radiologist). However, the six physicians in our study came from two specialties (three spine surgeons and three radiologists), thus we could have a multi-angle and more comprehensive understanding of the imaging manifestations of IDD, which may be one of the factors that caused the slight differences in results between two articles.
The current study has limitations which could be improved in some ways to better ascertain the inter- and intra-observer error of this grading system. Firstly, its relatively small sample size. Though the number of patients included in our study is more than that of Pfirrmann et al.[18] and Griffith et al. [6], further expanding our sample population will allow for more meaningful statistical testing on the agreement of these parameters. Secondly, recall bias from evaluators, namely the deviation of results for repeated assessments in all evaluators, as shown in Table 6. This deviation has been mentioned by Wang YX et al.[22] in their study, which indicated that there was no significant difference in repeated assessments performed on the same day by the same evaluator, but the deviation was obvious when the same evaluator made further assessments 8 months later. Thus, in any study setting, paired assessments should be conducted ideally in a short period of time. And 12-weeks interval still might be long in our study. Thirdly, the difference in specialty is a important factor. Though evaluators were from two specialties and the multidisciplinary team might increase the comprehensiveness of this study, we must point out that radiologists did not specialize in spine and lack deep understanding and profound insights of IDD or the grading system, which may affect the accuracy of final result. So, it may be valuable to repeat this study with overall senior spine surgeons to explore if higher skill level and specialization will cause a better agreement than that assessed by junior evaluators or multidisciplinary team. Finally, as mentioned above, postoperative IDD is the cause of poor prognosis and reoperation, but we excluded patients with presence of instrumentation in the cervical spine so as to make a better judgment of the inter-vertebral disc. On this issue, there was much controversy when we designed the study, after long discussions, we determined to eliminate all objective factors including fracture, tumor, infection, and presence of instrumentation. For postoperative IDD, we will lay more emphasis in our later study. Therefore, high-quality, large sample, and multicenter studies should be performed in our future clinical work to provide spine surgeons with the best evidence-based information.
Modified Pfirrmann grading system has excellent inter-observer reliability and intra-observer reproducibility on cervical IDD in CSM. In addition, it indicates a good appliance among spine surgeons and radiologists, clinical and radiological studies applying it should be deemed accurate. Thus, modified Pfirrmann grading system can be widely used as an appropriate instrument in clinical care. However, we still need more future prospective studies to evaluate whether this grading system allows better decision-making or prognosis-prediction in individual patients.
CSM (cervical spondylotic myelopathy)
IDD (inter-vertebral disc degeneration)
MRI (magnetic resonance imaging)
SPSS (Statistical Product and Service Solutions)
ICC (intra-class correlation coefficient)
wκ (weighted kappa)
JOA (Japanese Orthopedic Association)
NDI (Neck Disabilitv Index)
Ethics approval and consent to participate
The case was reviewed by the Longhua Hospital Ethics Committee and ethical approval was waived as written consent was obtained from the patient.
Consent for publication
Written patient consent was obtained for publication of all aspects of the case including personal and clinical details and images, which may compromise anonymity.
Availability of data and material
All supporting data can be provided upon request to the authors.
Competing interests
All authors read and approved the final manuscript and declare that they have no competing interests.
Funding
No funding was obtained for this study.
Authors’ contributions
XCQ and YMC are co-first authors of this manuscript. XCQ designed the study and collected the data. YMC did the data analysis. XCQ wrote the manuscript. MW revised the manuscript and decided to submit the manuscript for publication. All authors read and approved the final manuscript.
Acknowledgements
I want to thank my love, Du Ying, no matter how difficult it is, she never gives up, always cares for me, silently supports me, and gives me courage when I lose confidence. Without her help, understanding, tolerance and support, I believe that the life of a PhD in these three years will be very different.