An Independent Agreement Study of Modified Pfirrmann Grading System for Cervical Inter-vertebral Disc Degeneration in Cervical Spondylotic Myelopathy

Background

Cervical spondylotic myelopathy (CSM) is the most serious type of cervical spondylosis. It refers to a group of comprehensive symptoms such as neck pain, limb dysfunction and even paralysis, which is caused by compression of spinal cord due to degeneration of cervical vertebra, inter-vertebral disc, surrounding ligament and other soft tissues.[1, 2] The proportion of hospitalized patients with CSM is 4.04/100000 per year, which has doubled in the past 10 years, and the number of patients undergoing surgery is more than 7 times higher each year.[3]

For CSM surgery, the choice of operation method is the result of comprehensive evaluation. In addition to the detailed understanding of cervical spine stability, operative segment, and severity of spinal stenosis and spinal cord compression, it is also necessary to evaluate operative and adjacent cervical degeneration, so as to make a more detailed operation plan.[4, 5] The preoperative degeneration, especially inter-vertebral disc degeneration (IDD), determines whether to adopt fusion strategy, while the postoperative degeneration of adjacent segment is one of the main reasons for reoperation. In the past clinical work, due to the lack of a clear definition of IDD, there is few reliable criteria to determine the surgical strategy of CSM patients, and to judge the prognosis and risk factors of the operation. It is of great importance to establish an comprehensive and rational grading system based on modern imaging examinations for IDD. Magnetic resonance imaging (MRI) is considered to be the best imaging tool for the evaluation of IDD, for it shows both disc morphology and hydration.

In 2007, Griffith et al.[6] proposed the modified Pfirrmann grading system which classified lumber inter-vertebral discs into 8 grades, in order to make a qualitative research of IDD (Table 1, Fig. 1). The 8 grades represent a progression from normal disc to severe disc degeneration. Grade 1 corresponds to no disc degeneration while Grade 8 corresponds to end-stage degeneration.

Table 1

Modified Pfirrmann Grading System
Grade	Signal From Nucleus and Inner Fibers of Anulus*	Distinction Between Inner and Outer Fibers of Anulus at Posterior Aspect of Disc	Height of Disc
1	Uniformly hyperintense, equal to CSF	Distinct	Normal
2	Hyperintense (༞presacral fat and༜CSF) ± hypointense intranuclear cleft	Distinct	Normal
3	Hyperintense though༜presacral fat	Distinct	Normal
4	Mildly hyperintense (slightly༞outer fibers of anulus)	Indistinct	Normal
5	Hypointense (=outer fibers of anulus)	Indistinct	Normal
6	Hypointense	Indistinct	༜30% reduction
7	Hypointense	Indistinct	30–60% reduction
8	Hypointense	Indistinct	༞60% reduction
Grades 1, 2, and 3 are based on the signal intensity of the nucleus and inner fibers of anulus. For Grade 4, the margins between the inner and other fibers of the anulus at the posterior margin of the disc are indistinct. For Grade 5, the disc is uniformly hypointense, although there is no loss of disc space height. For Grades, 6, 7, and 8, there is progressive loss of disc space height. These could be broadly classified as mild, moderate, to severe loss of disc space height. Very occasionally, although obvious disc collapse is present, hyperintense signal from the nucleus and inner fibers of the anulus is preserved. This is referred to by a double entry, e.g.*, 4/7, with the former reporting the disc signal and the latter the degree of collapse.

For IDD in CSM, an adequate and rational grading system would standardize research terminology, allow easier communication among physicians and help to determine surgical strategy in individual patients. However, there was no study evaluating the modified Pfirrmann grading system and its application in cervical IDD, it still require independent validation. In view of the frequent need of MRI studies categorizing IDD in the evaluation of patients with CSM, our study aims to analyze the inter-observer reliability and intra-observer reproducibility of modified Pfirrmann grading system. Besides, this will be the first study assessing its application value in cervical IDD.

1. Materials And Methods

1.1 Patient Case Selection and Evaluation

This study was performed in accordance with the Declaration of Helsinki, and institutional review board approval was obtained from our ethics committee with informed signed consent being provided by all participating subjects. Database records of patients with CSM admitted to Shanghai Longhua Hospital from 2018 to 2019 were retrospectively collected and analyzed. spine Cervical MRI examinations and available clinical data were required for inclusion. Cases with concurrence of cervical spine fracture, tumor, infection, or presence of instrumentation in the cervical spine were excluded. A complete MRI examination must include T2-weighted turbo spin sagittal images without fat suppression to cover all types of modified Pfirrmann grading system. MRI data were gained through 1.5-T whole-body imaging system. The complete and available clinical data included demographic characteristics, complaints, spinal cord and neurological function, concomitant diseases, and treatment history.

One resident of our department who did not participate in the later statistics and analysis collected the cases from our database of patients. Meanwhile, it was essential that physicians who treated the patients could not act as assessors. Six physicians from two specialties: three spine surgeons and three radiologists volunteered to be evaluators while they did not know the identity of the patient, the treatment they received, and the original classification used in clinical care. In order to conduct a sufficiently reliable study, each evaluator was provided necessary original literature and relevant information to evaluate cases on the basis of modified Pfirrmann grading system. Any different opinions about the system were discussed before performing the assessment through face-to-face meetings until all the evaluators came to a consensus. Standard image reports were available to evaluators as reference. According to modified Pfirrmann grading system, the six evaluators respectively assigned each cervical inter-vertebral disc with a single grade (from C2–C3 to C7–T1).

Inter-observer reliability was evaluated by comparing the initial responses of all the six evaluators. The assessment of intra-observer reproducibility was performed through comparing the same evaluator’s two responses of the same case with an interval of 12 weeks, and the cases were presented in a random order to minimize the recall bias.

All data analyses were performed using Statistical Packages of Social Sciences (SPSS) software (version 22.0). Considering the grading of all discs belonged to ordinal data, we adopted intra-class correlation coefficient (ICC) and weighted kappa (wκ) to measure inter- and intra-observer agreement for modified Pfirrmann grading system (two-way mixed effect model, in which people effects are random, and measures effects are fixed).[7] ICC allows to analyze the corresponding data when the observer agreement varies with multiple responses, while wκ makes it possible to assess agreement when not all disagreements are equally significant. Besides, we expressed ICC values with a 95% confidence interval (CI). For each grade of modified Pfirrmann grading system, Fleiss’s κ was used to assess inter-observer reliability, and intra-observer reproducibility was measured by Cohen’s κ.[8, 9] The range of ICC value is (0,1) while that of κ value is (− 1, 1). The larger the value is, the better the agreement is. Based on the recommendations of Fleiss[10] and Landis et al.,[11] there were three levels of ICC, with ICC values 0.00 to 0.40 considered poor agreement, 0.40 to 0.74 fair to good agreement, and 0.75 to 1.00 excellent agreement, while levels of agreement for κ were divided into five grades, with κ values 0.00 to 0.20 considered slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 substantial agreement; and 0.81 to 1.00 near perfect agreement (Table 2).

Table 2

**Level of Agreement for ICC and κ Statistic Levels**
ICC / Level of Agreement	κ / Level of Agreement
0.00-0.40 / Poor	0.00-0.20 / Slight
0.40–0.74 / Fair to good	0.21–0.40 / Fair
0.75-1.00 / Excellent	0.41–0.60 / Moderate
/	0.61–0.80 / Substantial
/	0.81-1.00 / Near perfect
Intra-class correlation coefficient (ICC) and kappa coefficient (κ value) are used for consistency test, which rare indexes to measure the accuracy of classification.

2 Result

According to the exclusion criteria, a total of 165 consecutive cases from our database of patients since 2018–2019 were involved in this study, including 94 males and 71 females with an average age of 63.5 ± 2.4 years (range from 43 to 85 years) (Table 3). There were 990 cervical inter-vertebral discs altogether in these individuals, and for one assessment, we finally obtained 5940 records since each disc was evaluated by six evaluators. After 12 weeks, we acquired another record of 5940 evaluations.

Table 3

General Information of Patients
No.	Content	Number
1 2	Sex (Male/Female) Age (mean)	94 / 71 63.5 ± 2.4
3	Complaints Neck pain	86
	Sensory dysfunction	160
	Motor dysfunction	154
4	Sensory dysfunction Upper extremity Lower extremity Trunk	146 93 80
5	Motor dysfunction Upper extremity Lower extremity	128 133
6	Bladder function Normal Abnormal	91 74
7	Surgical / non-surgical treatment history	2 / 163

In the first assessment using modified Pfirrmann grading system, all discs were classified into Grade 1 (6 discs), Grade 2 (30 discs), Grade 3 (1346 discs), Grade 4 (1178 discs), Grade 5 (1602 discs), Grade 6 (1123 discs), Grade 7 (415 discs), and Grade 8 (240 discs), while the 12-weeks-later assessment was as follows: Grade 1 (4 discs), Grade 2 (19 discs), Grade 3 (1357 discs), Grades 4 (1285 discs), Grades 5 (1498 discs), Grades 6 (997 discs), Grades 7 (488 discs), and Grades 8 (292 discs) (Table 4).

Table 4

Assigned Grades for Inter-vertebral Disc in Twice Assessments
	The First Assessment	The Second Assessment
Grade	Number / Proportion	Number / Proportion
1	6 / 0.1%	4 / 0.1%
2	30 / 0.5%	19 / 0.3%
3	1346 / 22.7%	1357 / 22.8%
4	1178 / 19.8%	1285 / 21.6%
5	1602 / 27.0%	1498 / 25.2%
6	1123 / 18.9%	997 / 16.8%
7	415 / 7.0%	488 / 8.2%
8	240 / 4.0%	292 / 4.9%

2.1 Inter-observer Reliability

Based on reliability analysis of the results among the six evaluators, the overall inter-observer agreement of modified Pfirrmann grading system using ICC and wκ were respectively excellent and near perfect. The ICC value was 0.76 [95% CI, (0.74, 0.78)] while the wκ value was 0.82 [95% CI, (0.78, 0.86)]. And the inter-observer agreement of six cervical discs: C2/3, C3/4, C4/5, C5/6, C6/7 and C7/T1 were mostly good (including fair to good, excellent, substantial, and near perfect), which indicated no significant difference in agreement evaluation of various cervical discs (Table 5).

Table 5

Reliability Analysis by Disc for Modified Pfirrmann Grading System
	Evaluations	ICC / Level of Agreement	wκ / Level of Agreement
All discs	5940	0.76 / Excellent	0.82 / Near perfect
C2/3	990	0.68 / Fair to good	0.77 / Substantial
C3/4	990	0.79 / Excellent	0.83 / Near perfect
C4/5	990	0.84 / Excellent	0.90 / Near perfect
C5/6	990	0.72 / Fair to good	0.74 / Substantial
C6/7	990	0.85 / Excellent	0.85 / Near perfect
C7/T1	990	0.74 / Fair to good	0.87 / Near perfect

We compared the agreement evaluation of spine surgeons with that of radiologists, and found no significant difference between the two specialties [ICC = 0.85 (0.79–0.91), wκ = 0.77 (0.74–0.80)]. In addition, each specialty had excellent or near perfect agreement [spine surgeon: ICC = 0.88 (0.86–0.90), wκ = 0.90 (0.87–0.93); radiologists: ICC = 0.78 (0.74–0.82), wκ = 0.86 (0.81–0.91)] (Table 6).

Table 6

**Reliability Analysis by Specialty of Evaluators for Modified Pfirrmann Grading System**
	Evaluations	Intra-specialty comparison
	Evaluations	ICC / Level of Agreement	wκ / Level of Agreement
Spine surgeons	2970	0.88 / Excellent	0.90 / Near perfect
Radiologists	2970	0.78 / Excellent	0.86 / Near perfect
Inter-specialty comparison
ICC / Level of Agreement	0.85 / Excellent	/	/
wκ / Level of Agreement	0.77 / Substantial

2.2 Intra-observer Reproducibility

Similar to the first assessment, the repeated assessment after 12 weeks indicated that the inter-observer agreement of modified Pfirrmann grading system was also excellent or near perfect [ICC = 0.79 (0.77, 0.81), wκ = 0.81 (0.77, 0.85)]. The following reproducibility analysis of the same evaluator's results showed excellent intra-observer agreement, with all ICC and wκ values higher than 0.80. Besides, the intra-observer agreement based on disc level was excellent as well, which showed no difference in agreement evaluation of various cervical discs (Table 7).

Table 7

Reproducibility Analysis for Modified Pfirrmann Grading System
Evaluator∗	ICC / Level of Agreement	wκ / Level of Agreement
A	0.89 / Excellent	0.90 / Near perfect
B	0.86 / Excellent	0.84 / Near perfect
C	0.80 / Excellent	0.83 / Near perfect
D	0.83 / Excellent	0.84 / Near perfect
E	0.91 / Excellent	0.89 / Near perfect
F	0.86 / Excellent	0.92 / Near perfect
Overall	0.84 / Excellent	0.87 / Near perfect
∗A, B, C, D, E, F represent the 6 evaluators who participated in the study.

3. Discussion

In recent years, the incidence of CSM has significantly increased.[12] Affected by environmental factors and growth of age, IDD, cervical small joint degeneration and formation of vertebral marginal osteophyte may cause spinal stenosis, chronic compression of the spinal cord, leading to neck pain, motor dysfunction and even paralysis.

At present, it is accepted that the pathological process of CSM mainly includes static, dynamic and ischemic mechanisms.[13, 14, 15] Cervical IDD is considered to be the trigger of static mechanism. It will lead to changes in the biomechanics of cervical spine, which may induce the formation of spur in the vertebral endplate. Meanwhile, the herniation of degenerative discs will squeeze the ligamentum flavum and make it penetrate into the spinal canal, causing spinal canal stenosis. When the structure of cervical spine is abnormal due to the static mechanism, the flexion and extension of cervical spine will precipitate the irreversible damage of the spinal cord.[16] If the cervical instability caused by IDD occurs in the motion segment, it will result in dynamic compression on spinal cord, along with the progress of pathological process, the stability and joint degeneration of this segment will gradually deteriorate, and the spinal canal will become increasingly narrower.[13] Briefly, IDD plays a leading role in the pathogenesis of CSM.

A recent study shows that non-surgical treatment is not suitable for moderate and severe CSM, for there is no evidence indicating non-surgical treatment can effectively inhibit or reverse the natural history of CSM, and the progression of the disease will bring serious consequence, such as deterioration in the quality of life, significant dysfunction and adverse impact on surgery efficacy, while the risk of secondary spinal cord injury or central syndrome is higher. Therefore, it is generally believed that once CSM is diagnosed, surgery should be performed as early as possible.[17] As mentioned in preceding part of the text, preoperative IDD largely determines surgical strategy, while postoperative IDD plays a decisive role in prognosis. With the continuous development of relevant grading system, treatment concepts and techniques, surgical decision-making of CSM have been further improved, providing important clinical value for standardized treatment.

In 2001, Pfirrmann et al.[18] developed the most well-known grading system based on MRI, dividing IDD into five grades according to disc signal intensity, disc structure, distinction between nucleus and anulus, and disc height. Though this classification has been widely accepted and proved to have excellent inter- and intra-observer agreement,[19] study[6] found that it did not demonstrate discriminatory when applied to evaluate IDD in the elderly spine, besides, on the basis of images and descriptions provided, there were ambiguities in grading IDD as one level or another. To address these deficiencies, Griffith et al.[6] proposed a modified Pfirrmann classification which increased the 5 grades to 8 (Table 1, Fig. 1), so as to improve its discriminatory power when evaluating the elderly spine and minimize ambiguity when selecting grades.

The establishment of modified Pfirrmann grading system not only gives spine surgeons a clear definition of IDD, but also provides an ideal treatment plan prediction of prognosis for patients with CSM. At present, JOA[20] and NDI[21] scoring systems are the most commonly used criteria to evaluate the treatment of patients with CSM, in particular, JOA system can divide CSM into three levels, mild, moderate and severe according to the score, in order to help physicians determine whether patients need surgery as soon as possible. It is worth mentioning that both scoring systems focus on patients, especially their functional status, but neither JOA nor NDI scores lay emphasis on the cervical spine, no matter vertebra, inter-vertebral disc, spinal canal or spinal cord. Hence the advantage of modified Pfirrmann grading system is obvious.

However, it must be emphasized that the cervical IDD is only one consideration for surgery. Other important related factors include: non-surgical treatment, cervical spine stability, operative segment, severity of spinal stenosis and spinal cord compression, prognosis, etc. Thus, modified Pfirrmann grading system can only be regarded as an important reference for surgery, and the most ideal treatment scheme can be formulated by combining JOA and NDI scores.

The results show that the twice inter-observer agreement (ICC: 0.76, 0.79; wκ: 0.82, 0.81) of modified Pfirrmann grading system are slightly higher than that reported by Griffith et al.[6], while the intra-observer agreement in this study is excellent ((ICC: 0.84; wκ: 0.87), similar to that of Griffith et al.[6], indicating that modified Pfirrmann grading system has a very good consistency. It is noteworthy that the evaluators involved in establishing modified Pfirrmann grading system were all radiologist (two musculoskeletal radiologists and a general radiologist). However, the six physicians in our study came from two specialties (three spine surgeons and three radiologists), thus we could have a multi-angle and more comprehensive understanding of the imaging manifestations of IDD, which may be one of the factors that caused the slight differences in results between two articles.

The current study has limitations which could be improved in some ways to better ascertain the inter- and intra-observer error of this grading system. Firstly, its relatively small sample size. Though the number of patients included in our study is more than that of Pfirrmann et al.[18] and Griffith et al. [6], further expanding our sample population will allow for more meaningful statistical testing on the agreement of these parameters. Secondly, recall bias from evaluators, namely the deviation of results for repeated assessments in all evaluators, as shown in Table 6. This deviation has been mentioned by Wang YX et al.[22] in their study, which indicated that there was no significant difference in repeated assessments performed on the same day by the same evaluator, but the deviation was obvious when the same evaluator made further assessments 8 months later. Thus, in any study setting, paired assessments should be conducted ideally in a short period of time. And 12-weeks interval still might be long in our study. Thirdly, the difference in specialty is a important factor. Though evaluators were from two specialties and the multidisciplinary team might increase the comprehensiveness of this study, we must point out that radiologists did not specialize in spine and lack deep understanding and profound insights of IDD or the grading system, which may affect the accuracy of final result. So, it may be valuable to repeat this study with overall senior spine surgeons to explore if higher skill level and specialization will cause a better agreement than that assessed by junior evaluators or multidisciplinary team. Finally, as mentioned above, postoperative IDD is the cause of poor prognosis and reoperation, but we excluded patients with presence of instrumentation in the cervical spine so as to make a better judgment of the inter-vertebral disc. On this issue, there was much controversy when we designed the study, after long discussions, we determined to eliminate all objective factors including fracture, tumor, infection, and presence of instrumentation. For postoperative IDD, we will lay more emphasis in our later study. Therefore, high-quality, large sample, and multicenter studies should be performed in our future clinical work to provide spine surgeons with the best evidence-based information.

References

Fujiyoshi T, Yamazaki M, Kawabe J, et al. A new concept for making decisions regarding the surgical approach for cervical ossifcation of the posterior longitudinal ligament: the K-line. Spine. 2008;33:E990-E993.
Karadimas S, Erwin W, Ely C, et al. Pathophysiology and natural history of cervical spondylotic myelopathy. Spine. 2013;38:S21-36.
Wu J C, Ko C C, Yen Y S, et al. Epidemiology of cervical spondylotic myelopathy and its risk of causing spinal cord injury: a national cohort study. Neurosurg Focus. 2013;35:E10.
Masaki Y, Yamazaki M, Okawa A, et al. An analysis of factors causing poor surgical outcome in patients with cervical myelopathy due to ossification of the posterior longitudinal ligament: anterior decompression with spinal fusion versus laminoplasty. J Spinal Disord Tech. 2007;20:7-13.
Sakai K, Okawa A, Takahashi M, et al. Five-year follow-up evaluation of surgical treatment for cervical myelopathy caused by ossification of the posterior longitudinal ligament: a prospective comparative study of anterior decompression and fusion with floating method versus laminoplasty. Spine (Phila Pa 1976). 2012;37:367-376.
Griffith JF, Wang YX, Antonio GE, e al. Modified Pfirrmann Grading System for Lumbar Intervertebral Disc Degeneration. Spine (Phila Pa 1976). 2007;32:E708-12.
Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86:420–8.
Fleiss J. Measuring nominal scale agreement among many raters. Psycho Bull. 1971;76:378–81.
Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20:37–46.
Fleiss J. The design and analysis of clinical experiments. Wiley, New York. 1986;pp 1–31.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–74.
Boogaarts HD, Bartels RHMA. Prevalence of cervical spondylotic myelopathy. European Spine Journal. 2013;24:139-141.
Young WF. Cervical spondylotic myelopathy: a common cause of spinal cord dysfunction in older persons. American Family Physician. 2000;62:1064-1070,73.
Clarke E, Robinson PK. Cervical myelopathy: a complication of cervical spondylosis. Brain. 1956;79:483-510.
Fehlings MG, Skaf G. A review of the pathophysiology of cervical spondylotic myelopathy with insights for potential novel mechanisms drawn from traumatic spinal cord injury. Spine. 1998;23:2730-2736.
Singh A, Tetreault L, Fehlings M G, et al. Risk factors for development of cervical spondylotic myelopathy:Results of a systematic review. Evidence-based Spine-care Journal. 2012;3:35.
Rhee JM, Shamji MF, Erwin WM, et al. Nonoperative management of cervical myelopathy: a systematic review. Spine. 2013;38:55-67.
Pfirrmann CW, Metzdorf A, Zanetti M, et al. Magnetic resonance classification of lumbar intervertebral disc degeneration. Spine (Phila Pa 1976). 2001;26:1873–1878.
Urrutia J, Besa P, Campos, M, et al. The Pfirrmann classification of lumbar intervertebral disc degeneration: an independent inter- and intra-observer agreement assessment. Eur Spine J. 2016;25:2728-33.
Japanese Orthopaedic Association. Criteria on the evaluation of the treatment of cervical spondylotic myelopathy. Journal of the Japanese Orthopedic Association. 1975; 49.
Vernon H, Mior S. The Neck Disability Index: a study of reliability and validity.J Manipulative Physiol Ther. 1991;14:409-15.
Wang YX, Kuribayashi H, Wagberg M, et al. Gradient echo MRI characterization of development of atherosclerosis in the abdominal aorta in Watanabe Heritable Hyperlipidemic rabbits. Cardiovasc Intervent Radiol. 2006;29:605–12.

An Independent Agreement Study of Modified Pfirrmann Grading System for Cervical Inter-vertebral Disc Degeneration in Cervical Spondylotic Myelopathy

Abstract

Background

Methods/Design:

Results

Conclusion

Background

1. Materials And Methods

1.1 Patient Case Selection and Evaluation

2 Result

2.1 Inter-observer Reliability

2.2 Intra-observer Reproducibility

3. Discussion

Conclusion

Abbreviations

Declarations

References