The Measurements of Frailty and Their Possible Application to Spinal Conditions. A Systematic Review

Background: Frailty is associated with an increased risk of postoperative adverse events (AEs) within the surgical spine population. Multiple frailty tools have been reported in the surgical spine literature. However, the applicability of these tools remains unclear. The primary objective of this systematic review is to appraise the construct, feasibility, objectivity, and clinimetric properties of frailty tools reported in the surgical spine literature. Secondary objectives included determining the applicability and the most sensitive surgical spine population for each tool. Methods: This systematic review was registered with PROSPERO: CRD42019109045. Publications from January 1950 to December 2020 were identied by a comprehensive search of PubMed, Ovid, Embase, and Cochrane, supplemented by manual screening. Studies reporting and validating a frailty tool in the surgical spine population with a measurable outcome were included. Each tool and its respective clinimetric properties were evaluated using validated criteria and denitions. The applicability of each tool and its most sensitive surgical spine population was determined by panel consensus. Bias was assessed using the Newcastle-Ottawa Scale. Results: 47 studies were included in the nal qualitative analysis. A total of 14 separate frailty tools were identied, in which nine tools assessed frailty according to the cumulative decit denition, while four instruments utilized phenotypic or weighted frailty models. One instrument assessed frailty according to the comprehensive geriatric assessment (CGA) model. Twelve measures were validated as risk stratication tools for predicting postoperative AEs, while one tool investigated the effect of spine surgery on postoperative frailty trajectory. The modied frailty index (mFI), 5-item mFI, adult spinal deformity frailty index (ASD-FI), FRAIL Scale, and CGA had the most positive ratings for clinimetric properties assessed. Conclusions: The assessment of frailty is important in the surgical decision-making process. Cumulative decit and weighted frailty instruments are appropriate risk stratication tools. Phenotypic tools are sensitive for capturing the relationship between spinal pathology, spine surgery, and prehabilitation on frailty trajectory. CGA instruments are appropriate screening tools for identifying health decits susceptible to improvement and guiding optimization strategies. Studies are needed to determine whether spine surgery and prehabilitation


Introduction
Concurrent with the ageing population, the number of elderly patients with comorbidities presenting to surgeons for surgical consideration is increasing 1 . This is concerning as these patients undergoing surgery are at an increased risk of postoperative adverse events (AEs) 1, 2 . This increased vulnerability was initially thought to be due to the effects of ageing and comorbidity burden. Recent evidence suggests that frailty imparts a substantial risk to the development of adverse outcomes [1][2][3][4][5] . Frailty is a syndrome characterized by the age-associated decline in physiological reserve and reduced resilience to stressors resulting in adverse health outcomes 6, 7 . The concept of frailty and its impact on health outcomes has been well validated in the geriatric literature. This relationship has only been recently investigated within the surgical spine population, with evidence identifying that frailty is signi cantly associated with postoperative AEs 8 .
Unfortunately, there is no standardized tool for assessing frailty due to the heterogeneity of the syndrome and the multiple systems affected. Two main models have been described to help operationalize frailty tools in a standardized and speci c manner. The phenotypic model, described by Fried et al, conceptualizes frailty as a biological syndrome resulting from the age-associated decline across multiple physiological systems 6 . The frailty index (FI), proposed by Rockwood et al, conceptualizes frailty as a lifelong accumulation of age-related de cits 9 . Frailty occurs when a certain threshold of age-related de cits is reached and overwhelms the physiological reserve 9 . Several other surrogate markers for frailty have been described, such as sarcopenia. De ned as the progressive loss of skeletal muscle mass, strength, and power, sarcopenia can be the effect of musculoskeletal ageing, but it is not speci c to frailty 10 . quality of evidence for each included study was evaluated using a 5-point scale derived from the Oxford Centre for Evidence-Based Medicine (Appendix-3) 22 . Any disagreements between the two lead reviewers were resolved by either panel consensus between all authors or adjudication by a third author (J.S.).
Finally, all authors participated in a panel evaluation to determine the clinical applicability of each frailty tool for either risk strati cation or capturing the relationship between spinal pathology and surgical intervention on frailty trajectory. This was determined by reviewing the clinimetric properties assessed and whether the components for each frailty tool were modi able or non-modi able. The authors also determined the spine population(s) most sensitive for each frailty tool. This evaluation is important given the heterogeneity of the spine population, whereby different spinal pathologies impart different effects on frailty.

Results
The literature search retrieved a total of 8,268 publications, from which 43 were retained, along with four additional articles found in the authors' libraries or bibliographies of reviewed full-text articles (Figure-2). 47 studies were included in the nal analysis and extraction of data  .

Study Characteristics
Of the 47 included studies, frailty tools were reported in the following spine populations: degenerative disease, complex adult spinal deformity, oncology, trauma, and cervical fusion (Table-1). The remaining studies reported a frailty measure within the spondylodiscitis, anterior lumbar interbody fusion (ALIF), thoracolumbar instrumentation, or vertebral tuberculosis population. Several studies did not specify a speci c spine population. Overall, most included studies were retrospective in design, utilized an age inclusion criteria of eighteen years of age or greater (age ≥ 18 years), and reported postoperative AEs as the primary outcome of interest. A comprehensive summary of the study characteristics including study design, age inclusion criteria, outcome of interest, and outcome measure is outlined in Table-1.

Prevalence of Frailty
Signi cant differences in frailty prevalence were observed between different surgical spine populations due to the frailty tool used, the effect of underlying spinal pathology on frailty, and the cutoff values applied to stratify the study population into robust, pre-frail, and frail cohorts. A comprehensive summary of the frailty prevalence reported amongst the included studies and between different populations is outlined in Table-1. Frailty prevalence could not be calculated or identi ed from several studies due to insu cient information/data or the lack of cutoff values stratifying the population into robust, pre-frail, and frail cohorts. Overall, the prevalence of frailty was higher in the complex adult spinal deformity population.

Characteristics of Frailty Tools
The selected studies yielded 14 frailty tools representing the combination of 357 individual items designed to assess frailty domains (Table-1, Table-2). Five subscales were identi ed (Appendix-1). The total number of components (individual items) reported in a single frailty tool ranged from 5 to 109.
Nine of the 14 frailty tools operationalized frailty according to the accumulation of de cit model (Table-2). All nine of these frailty measures utilized a dichotomous scale to evaluate the presence or absence of de cits. Four measures calculated frailty as a ratio (n/t) of the sum of de cits present in the model (n) divided by the total number of de cits evaluated (t). Five tools calculated frailty as a whole number by the sum of de cits present within the model.
Of the 47 studies reporting a cumulative de cit model, the modi ed frailty index (mFI) was the most reported frailty tool in 26 studies (55.3%), followed by the 5-item mFI in six studies (12.8%), adult spinal deformity frailty index (ASD-FI) in ve studies (10.6%), cervical deformity frailty index (CD-FI) in three studies (6.4%), and the metastatic spinal tumour frailty index (MSTFI) in three studies (6.4%). The modi ed cervical deformity frailty index (mCD-FI), primary spinal tumour frailty index (PSTFI), frailty based score (FBS), and the modi ed frailty score (MFS) were the least reported cumulative de cit measures (Table-1 and  Table-2).
Of the 26 studies utilizing the mFI, 15 studies reported prede ned cutoff values to stratify the study population into robust, pre-frail, or frail cohorts, while the remaining 10 studies reported a continuous dose-response ratio (Table-2). Only one study reported both prede ned mFI cutoff values and a continuous doseresponse ratio 42 . Similarly, prede ned robust, pre-frail, and frail values were reported by all studies using the 5-item mFI (Table-2). All studies utilizing the ASD-FI reported prede ned cutoff values to stratify patients into robust, frail and severely frail cohorts (Table-2). Prede ned MSTFI values were reported in two studies to stratify patients into mild, moderate and severely frail MSTFI scores (Table-2). One study reported the MSTFI as a continuous score 36 . Studies reporting the CD-FI applied prede ned values to stratify patients into robust, frail and severely frail cohorts or non-frail and frail cohorts. Prede ned values were applied in the studies reporting the mCD-FI and PSTFI. The studies reporting the FBS and the MFS did not use cutoff values.
The FRAIL Scale and Fried Phenotype measures are ordinal scores containing items operationalized according to the phenotypic frailty model (Table-2). Frailty is calculated based on the sum of the items present within each tool. Prede ned robust, pre-frail and frail cutoff values were reported by the studies utilizing these measures. The Hospital Frailty Risk Score (HFRS) and Risk Analysis Index (RAI) operationalized frailty according to a weighted scale system. Components are derived from either the phenotypic, cumulative de cit, or comprehensive geriatric assessment (CGA) frailty models. The HFRS and RAI use prede ned values to stratify the study population into robust, pre-frail, frail or severely frail cohorts. Lastly, one study operationalized frailty according to the CGA model. The CGA examines frailty using validated subscales with prede ned values to identify the presence of the frailty domain. The CGA calculates frailty on an ordinal scale, and a prede ned criterion identi es the frailty syndrome (Table-2).
The most common frailty domains assessed were comorbidity status (93%), function (86%), nutrition and weight (79%), cognition (50%), and mood and mental health (43%). Domains of energy, strength, fall risk, and continency were assessed in 36% of included measures. Laboratory features and social support were assessed in 29% of measures, while general health and polypharmacy were assessed in 14%. Clinical symptoms/signs, vision or hearing impairment, living status, and slow gait speed were assessed in 7% of measures. Two tools included non-frailty domains such as surgical approach and tumor-speci c radiographic features. None of the frailty tools assessed the domains of care goals, advanced directives, sexual function, dentition, or spirituality. Ten of the frailty tools identi ed were validated for use in a clinical context. The remaining four were validated for use in either a clinical or community context (Table-2). Special equipment or training was reported for three of the frailty tools. It should be noted that no publication or study reported the time to complete each measure.

Predictors of Outcome
Of the 14 frailty tools, 13 were evaluated as predictors of postoperative AEs or postoperative functional outcomes (Table-2). Only one of the tools investigated the effect of spine surgery on postoperative frailty trajectory (Table-2). The remaining tool was not appropriately evaluated for predicting postoperative outcomes. Appendix-2 contains a detailed summary of the predictive validity for each frailty tool.

Modi ed Frailty Index (mFI)
Of the 26 studies reporting the use of the mFI, the validity as a risk strati cation tool for predicting postoperative AEs was assessed in 23 studies using appropriate statistical methodology. Within the degenerative spine population undergoing complex primary elective spine surgery, the mFI signi cantly and independently predicted postoperative AEs including mortality, major and minor morbidity, prolonged postoperative LOS, adverse discharge disposition, and unplanned readmission and reoperation (Appendix-2). Further receiver operator characteristic (ROC) analysis identi ed acceptable sensitivity for the mFI to predict postoperative AEs within this patient population (Appendix-2). However, in the degenerative spine population undergoing non-complex spine surgery, the mFI was not a signi cant or sensitive predictor of postoperative AEs (Appendix-2).
Within the complex adult spinal deformity population, the mFI signi cantly and independently predicted postoperative AEs including mortality, major and minor morbidity, and hardware/implant complications with excellent sensitivity after ROC analysis (Appendix-2). Limited studies assessed the validity of the mFI as a risk strati cation tool for predicting postoperative AEs in the spine trauma population (Appendix-2). Initial validation demonstrated that the mFI weakly predicted postoperative AEs following surgical stabilization of thoracolumbar fractures (Appendix-2). Further validation demonstrated that the mFI did not predict postoperative AEs including mortality, adverse discharge disposition, or prolonged postoperative LOS following complex spine surgery for traumatic spinal cord injury (tSCI) (Appendix-2).
Similarly, in the spine tumor population undergoing complex spine surgery, limited studies and con icting evidence limit the validity of the mFI as a risk strati cation tool for predicting postoperative AEs. Initial validation demonstrated that the mFI weakly predicted 30-day postoperative mortality and prolonged postoperative LOS with poor sensitivity (Appendix-2). Further validation identi ed that the mFI was not predictive of postoperative AEs including morbidity, mortality, and prolonged postoperative LOS (Appendix-2).
Within several unique spine populations, such as patients with spondylodiscitis or undergoing cervical fusion and anterior lumbar interbody fusion (ALIF), the mFI weakly predicted postoperative AEs including mortality and major morbidity (Appendix-2). When validated in several non-speci c surgical spine populations, the mFI signi cantly predicted postoperative AEs including major complications, mortality, postoperative surgical site infection, and prolonged postoperative LOS (Appendix-2).
Finally, pre-frail and frail mFI scores were signi cantly associated with lower 2-year postoperative functional and symptomatic scores following spine surgery for complex adult spinal deformity (Appendix-2). However, the mFI was not associated with any differences in 2-year postoperative radiographic outcomes (Appendix-2). Similarly, pre-frail and frail mFI scores were not associated with differences in 2-year postoperative functional and symptomatic outcomes in the degenerative spine population (Appendix-2).

Adult Spinal Deformity Frailty Index (ASD-FI)
Of the ve studies reporting the ASD-FI, three evaluated its validity as a risk strati cation tool for predicting postoperative AEs while two studies assessed the association between ASD-FI and postoperative functional outcomes in the complex adult spinal deformity population. As a risk strati cation tool, the ASD-FI signi cantly predicted postoperative AEs including major complications, prolonged postoperative LOS and reoperation (Appendix-2). Baseline preoperative ASD-FI scores signi cantly correlated with preoperative functional disability and 2-year postoperative functional outcomes (Appendix-2). Lastly, mild and severely frail ASD-FI scores were associated with worse baseline spinopelvic radiographic parameters including C7-S1 Sagittal Vertical Axis (SVA), Pelvic Incidence -Lumbar Lordosis (PI-LL) mismatch, and Pelvic Tilt (PT) (Appendix-2). Mild and severely frail ASD-FI scores were only weakly associated with signi cant differences in 3-year postoperative C7-S1 SVA (Appendix-2). Regarding functional outcomes, mild and severely frail ASD-FI scores were only associated with differences in standardized 1-year and 3-year postoperative functional outcomes (Appendix-2). When analyzing for change in postoperative functional outcome, the ASD-FI was only associated with improvements in 1-year and 3-year postoperative Scoliosis Research Society -22 (SRS-22) scores (Appendix-2).

Metastatic Spinal Tumour Frailty Index (MSTFI)
All studies reporting the MSTFI assessed its validity as a risk strati cation tool for predicting postoperative AEs within the metastatic spinal tumor population. Initial validation identi ed that the MSTFI signi cantly predicted postoperative major AEs and mortality with moderate discrimination and sensitivity (Appendix-2). Mild, moderate and severely frail MSTFI scores were also associated with signi cant differences in postoperative LOS (Appendix-2). However, further external validation identi ed that the MSTFI is not a predictor of postoperative AEs including mortality, major complications or prolonged postoperative LOS (Appendix-2). Further ROC analysis demonstrated poor sensitivity of the MSTFI to predict postoperative major AEs and overestimation of the MSTFI to predict postoperative in-hospital mortality (Appendix-2).

5-item Modi ed Frailty Index (5-item mFI)
All studies reporting the 5-item mFI assessed its validity as a risk strati cation tool for predicting postoperative AEs. Within the degenerative population undergoing primary elective complex cervical and lumbar spine surgery, the 5-item mFI signi cantly predicted postoperative AEs including mortality, major and minor AEs, adverse postoperative discharge disposition, prolonged postoperative LOS, and unplanned postoperative readmission and reoperation (Appendix-2). Further ROC analysis demonstrated good to excellent sensitivity of the 5-item mFI to predict postoperative AEs (Appendix-2). However, in the degenerative population undergoing non-complex lumbar spine surgery, the 5-item mFI did not signi cantly predict postoperative AEs (Appendix-2). When applied within the complex adult spinal deformity, the 5-item mFI signi cantly predicted postoperative AEs including major AEs and hardware-related complications (Appendix-2).

Cervical Deformity Frailty Index (CD-FI)
Two studies assessed the validity of the CD-FI as a risk strati cation tool for predicting postoperative AEs within the adult cervical deformity population undergoing complex spine surgery. The remaining studies assessed the effect of spine surgery on postoperative frailty trajectory or the association between CD-FI and postoperative radiographic and functional outcomes (Appendix-2). As a risk strati cation tool, only severely frail CD-FI scores predicted 2-year postoperative major AEs (Appendix-2). Frail CD-FI scores were not signi cantly predictive of 2-year postoperative major AEs (Appendix-2).
In regards to postoperative frailty trajectory, initial validation identi ed that spine surgery for cervical spine deformity signi cantly improved 1-year postoperative CD-FI scores (Appendix-2). Postoperative improvements in weakness, anxiety, driving, fatigue, exhaustion, concentration, recreation, activity, mobility, and depression were the most signi cant factors associated with improvement in postoperative frailty trajectory (Appendix-2). Improvement in baseline to 1-year postoperative spinopelvic radiographic parameters was also associated with signi cant improvements in 1-year postoperative frailty (Appendix-2). After further analysis, successful spine surgery and improvement in exhaustion were the two variables most predictive of 1-year postoperative improvement in frailty (Appendix-2).
The CD-FI was signi cantly associated with worse preoperative function and symptom scores in frail patients with cervical spine deformity awaiting spine surgery (Appendix-2). In terms of baseline radiographic parameters, frail CD-FI scores were associated with worse Sagittal Vertical Axis (SVA) alignment than the non-frail cohort (Appendix-2). However, there was no signi cant difference in either 3-month or 1-year postoperative radiographic changes between frail and non-frail CD-FI score cohorts (Appendix-2). The CD-FI was associated with signi cant differences in standardized 1-year postoperative functional and symptomatic outcomes including the Neck Disability Index (NDI), modi ed Japanese Orthopedic Association (mJOA) and EuroQol -5D (EQ5D) scores between non-frail and frail patients (Appendix-2). Following unadjusted analysis, the CD-FI was only associated with signi cant improvements in 1-year postoperative EQ5D scores (Appendix-2).

Modi ed Cervical Deformity Frailty Index (mCD-FI)
The validity of the mCD-FI as a risk strati cation tool was assessed by one study in the cervical deformity population. Initial validation demonstrated that only severely frail mCD-FI scores predicted postoperative mortality (Appendix-2). Further analysis did not identify the same association for frail mCD-FI scores (Appendix-2).

Primary Spinal Tumour Frailty Index (PSTFI)
The validity of the PSTFI as a risk strati cation tool was assessed by one study in the primary spinal tumour population. Initial validation identi ed that the PSTFI predicted postoperative major AEs with moderate sensitivity after ROC analysis (Appendix-2). However, external validation identi ed only severely frail PSTFI scores weakly predicted 30-day postoperative AEs with poor sensitivity after ROC analysis (Appendix-2).

Frailty Based Score (FBS)
Only one study assessed the validity of the FBS as a risk strati cation tool within the cervical fusion population. Initial validation identi ed that the FBS signi cantly predicted any 30-day postoperative AEs, including unplanned readmission and unplanned reoperation with moderate discrimination and sensitivity (Appendix-2).

FRAIL Scale
Two studies assessed the validity of the FRAIL Scale as a risk strati cation tool for predicting adverse postoperative cognitive and functional recovery. Within the degenerative population undergoing complex and non-complex cervical and lumbar spine surgery, the FRAIL Scale signi cantly predicted a reduced likelihood of 3-month postoperative functional recovery in the frail cohort (Appendix-2). The FRAIL Scale did not predict 3-month postoperative cognitive recovery or 3-day postoperative functional recovery (Appendix-2). Further external validation in a non-speci c population undergoing elective spine surgery identi ed that the FRAIL Scale signi cantly postoperative delirium in the frail cohort (Appendix-2).

Fried Frailty Phenotype Measure
The validity of the Fried Phenotype as a risk strati cation tool for predicting postoperative AEs was assessed by one study within the thoracolumbar degenerative and deformity population. Initial validation identi ed that the Fried Phenotype did not predict six-week postoperative AEs including major AEs and adverse postoperative discharge disposition (Appendix-2). The Fried Phenotype did not also predict postoperative unplanned readmission or prolonged postoperative LOS (Appendix-2).

Hospital Frailty Risk Score (HFRS)
Only one study assessed the validity of the HFRS as a risk strati cation tool for predicting postoperative AEs within the degenerative spine population. Initial validation demonstrated that moderate and severely frail scores predicted postoperative admission to critical care, the total incidence of postoperative AEs, adverse postoperative discharge disposition, postoperative unplanned readmission or emergency department visit, prolonged postoperative LOS and increased direct costs (Appendix-2). Further ROC analysis with the inclusion of the HFRS identi ed a greater sensitivity to predict postoperative AEs (Appendix-2).

Risk Analysis Index (RAI)
The validity of the RAI as a risk strati cation tool for predicting postoperative AEs was assessed by one study within a non-speci c surgical spine population.
Initial validation identi ed that pre-frail and frail RAI scores were signi cantly associated with a higher rate of postoperative readmission, mortality, and longer postoperative LOS compared to non-frail scores (Appendix-2). Pre-frail and frail RAI scores had a higher critical care admission rate than non-frail scores, but this did not reach statistical signi cance (Appendix-2). Further analysis observed that pre-frail and frail RAI scores signi cantly predicted 1-year postoperative mortality (Appendix-2).
Comprehensive Geriatric Assessment (CGA) The validity of the CGA as a risk strati cation tool for predicting postoperative AEs was assessed by one study within the degenerative population undergoing non-complex and complex lumbar spine surgery. Initial validation demonstrated that the CGA signi cantly predicted 30-day postoperative AEs including minor and major AEs (Appendix-2). Further analysis identi ed that the CGA predicted a greater likelihood of 30-day postoperative major and minors AEs in the complex fusion cohort (Appendix-2).

Modi ed Frailty Score (MFS)
Lastly, one study reported the association between the MFS and postoperative AEs in the vertebral tuberculosis population (Appendix-2). Initial observation demonstrated that the value of the MFS was signi cantly higher in the 30-day postoperative mortality cohort than the survival cohort (Appendix-2). However, the authors did not perform any formal analysis establishing the predictive validity of the MFS.
Clinimetric Properties, Objectivity, Feasibility, and Applicability Predictive validity was the most commonly assessed clinimetric properties across all the included studies (Table-3, Appendix-2). Content and concurrent validity, responsiveness, and reliability were the second most assessed clinimetric properties. The mFI, ASD-FI, 5-item mFI, FRAIL Scale, and CGA had the most positive ratings. None of the instruments identi ed had positive ratings for all the clinimetric properties. The MFS was the only instrument without any rating since none of the clinimetric properties were assessed. Appendix-2 summarizes the evidence evaluating the clinimetric properties of the frailty tools within the surgical spine literature. A more comprehensive summary of this evaluation is described in Table-3.
The mFI, FRAIL Scale, mCD-FI, 5-item mFI, MSTFI, FBS, MFS, RAI, and CGA were all clinically feasible tools (Table-3). Of these, the mFI, mCD-FI, 5-item mFI, MSTFI, MFS, RAI, and CGA were objective tools. The remaining measures were neither feasible nor objective. Nine of the 14 frailty tools are only applicable as risk strati cation tools (Table-3). This is due to the non-modi able constructs of these measures that cannot capture clinical changes in frailty or the initial validation of these instruments as risk strati cation tools. The FRAIL Scale, Fried Phenotype, ASD-FI, and CD-FI are applicable as either risk strati cation tools or frailty trajectory tools. The constructs of these measures contain modi able items sensitive to improvement. Only one frailty tool identi ed is not clinically applicable due to an absence of information assessing any clinimetric property.

Discussion
Although not necessarily synonymous with ageing, the prevalence of frailty is increasing in the surgical spine population 1,70 . This is concerning as frail patients undergoing spine surgery are at an increased risk of adverse postoperative outcomes 8 . Accordingly, the assessment of frailty is an important factor in the surgical decision-making process regarding surgical risk, invasiveness, and timing. However, the applicability of these instruments as risk strati cation or frailty trajectory tools is unknown. This is due to the heterogeneity and lack of consensus with frailty tools currently reported and the effect of underlying spine disease on frailty.
Similar reviews assessing the clinimetric properties and applicability of frailty tools have been completed in different contexts 16,18,71,72 . To our knowledge, this review is the rst to evaluate the objectivity, feasibility, applicability, and sensitivity of frailty tools reported in the surgical spine literature. Additionally, this systematic review is the rst that has rigorously evaluated the clinimetric properties of frailty tools reported in the surgical spine literature using a validated set of qualitative criteria and de nitions. One of the most important outcomes identi ed in our review is that although most tools were predictive of postoperative outcomes, many lacked formal evaluation of important clinimetric properties. Additionally, several frailty measures were not objective or clinically feasible. This was due to items (subjective questions) or techniques (lengthy questionnaires) common to these measures that cannot be reliably or reasonably completed in clinical practice.

Risk Strati cation Tools
The mFI, developed and validated by Velanovich et al, was constructed by matching 11 variables found within the National Surgical Quality Improvement Program (NSQIP) database to those within the 70-item Canadian Study of Health and Aging frailty index (CSHA-FI) 73 . Since its development, the mFI has been extensively validated as a risk strati cation tool for predicting postoperative AEs across the surgical literature 74 . In recent years, an increase in the missing proportion of variables required to calculate the mFI has raised concern about its validity as a risk strati cation tool 75 . To overcome this, Chimukangara et al identi ed the top ve most reported mFI variables within the NSQIP database, condensing the mFI into the 5-item mFI 76 . Across the surgical literature, the 5-item mFI is recognized as a valid risk strati cation tool for predicting postoperative AEs 76-78 .
Within the degenerative and deformity populations undergoing complex spine surgery, the mFI and 5-item mFI are sensitive risk strati cation tools for predicting postoperative AEs. These tools have been validated using a robust study methodology in large cohorts with accurate, precise and reproducible risk estimates. Additionally, the mFI and 5-item mFI are reliable tools given the high degree of concordance between their respective frailty tiers. Lastly, since few de cits are required to assess frailty, both tools are easily applicable without the need for an extensive chart review, special tests or training.
The mFI is not a sensitive risk strati cation tool in the non-complex degenerative, tumor, or trauma spine populations due to con icting evidence, poor study methodology, and construct limitations of the mFI. Since the mFI is mainly composed of de cits that assess comorbidity status, it is not sensitive for assessing the multiple systems affected by frailty. Consequently, in healthy patients with little to no comorbidities undergoing spine surgery, the mFI is signi cantly underpowered as a risk strati cation tool 24,27 . In the tumor population, the construct does not account for the physiological effects of metastatic disease, such as tumour burden and adjunctive therapy. These factors in uence underlying physiological reserve and confound the relationship between frailty and postoperative AEs 35,36,79 . Within the thoracolumbar trauma population, poor study design and insu cient evidence limit the validity of the mFI as a risk strati cation tool. Finally, in the tSCI population, the magnitude of the injury, patient age, and total motor score on admission overpowers any association between the mFI and postoperative AEs 33 .
The constructs of the mFI and 5-item mFI signi cantly deviate from the general multisystem concept of frailty. A valid frailty index must contain 30-40 de cits in which each de cit covers a range of systems, is associated with overall health status, increases in prevalence with age, and cannot saturate early 80 .
Frailty indices containing few de cits, such as the mFI and 5-item mFI, are prone to instability and imprecise index estimates 80 . Furthermore, during the design of the mFI and 5-item mFI, the reduction of frailty de cits from the 70-item CHSA-FI was performed without analysis of convergent validity 81 . This raises concern as to whether the mFI and 5-item mFI are of the same degree of construct as the CHSA-FI. Lastly, the non-modi able constructs of the mFI and 5-item mFI limit the sensitivity of these frailty tools to capture clinical changes. Yagi et al identi ed that despite optimization of each mFI factor, no signi cant reduction in postoperative AEs was observed when compared against the non-frail cohort 30 . Therefore, the mFI and 5-item mFI are applicable as risk strati cation tools only.
The ASD-FI, developed by Miller et al, was constructed using variables within the International Spine Study Group (ISSG) database that met the frailty index inclusion criteria 48 . Cutoff values were then applied to stratify the population into robust, frail, and severely frail cohorts. Since its development, the ASD-FI has demonstrated to be a valid risk strati cation tool for predicting postoperative AEs within the complex adult spinal deformity population. The ASD-FI also has several strengths as a risk strati cation tool compared to the mFI and 5-item mFI. The ASD-FI was developed using a standard methodology for creating accurate and precise frailty indexes 50 . The ASD-FI is also a more sensitive frailty tool as it evaluates a greater number of health domains within the frailty syndrome. The ASD-FI has also been extensively validated within the complex adult spinal deformity population as a risk strati cation tool. In a series of studies by Miller et al, the ASD-FI reliably predicted 2-year postoperative AEs in external and internal validation cohorts 48-50 . The mFI and 5-item mFI were validated in either a large national cohort with limited follow-up periods, underestimated complication rates and missing patient variables; or in small cohorts where patient age, lifestyle, and ethnicity impact surgical outcomes 28, 29 . However, the number of de cits required to calculate the ASD-FI makes it clinically unfeasible. Given this, the mFI and 5-item mFI are more appropriate risk strati cation frailty tools in the adult spinal deformity population.
The CD-FI was developed in the same fashion as the ASD-FI for use in the cervical deformity population as a risk strati cation tool 80 . Passias et al further condensed the CD-FI to a 15-item mCD-FI by identifying the health de cits most predictive of the overall CD-FI score 56 . The CD-FI and mCD-FI were internally validated as risk strati cation tools in the cervical deformity population 53,55,56 . However, it is unknown whether these measures are valid or sensitive risk strati cation tools for predicting postoperative AEs or functional outcomes. This is due to the lack of external validation studies, con icting evidence, and poor methodological design of the current validation study 55 .
As the ASD-FI and CD-FI contain several modi able frailty de cits that overlap with clinical features of spinal disease, these measures are sensitive to capturing the effect of spine surgery on postoperative frailty trajectory. Segreto et al identi ed a signi cant reduction in 1-year postoperative CD-FI scores following spine surgery for cervical deformity 54 . However, responsiveness was evaluated by a t-test that only compares differences in the score. This methodology does not assess the validity of the score change in relation to the CD-FI construct to capture responsiveness. Accordingly, the ASD-FI and CD-FI are more appropriate risk strati cation tools given the lack of literature assessing the responsiveness of these measures.
Although the ASD-FI, CD-FI, and mCD-FI are promising frailty tools, some concerns may limit the applicability of these tools. Firstly, the cutoff values chosen to stratify frailty severity were determined without any formal analysis. The health de cits included within these tools were also derived from questionnaires commonly utilized in spine practice. Consequently, the ASD-FI, CD-FI, mCD-FI may overestimate frailty and the associated predicted risk. Additionally, no formal sensitivity analysis has been performed assessing the performance of these measures against other frailty tools. Lastly, the need to acquire all 42 de cits to calculate the ASD-FI and CD-FI signi cantly hinders the clinical applicability of these tools.
The MSTFI and PSTFI were constructed as risk strati cation tools for the metastatic and primary spine tumor populations 62, 64 . De la Garza Ramos et al constructed the MSTFI by identifying patient recorded variables within a national multicenter database that had the greatest independent effect for predicting postoperative AEs 62 . Nine variables were identi ed to construct the MSTFI, and cutoff scores were applied to stratify patients into robust, mild, moderate, and severe frail cohorts. The PSTFI was developed using items within the MSTFI, except those pertaining to surgical approach 64 . Cutoff values were similarly applied to stratify patients according to frailty severity.
Within the metastatic spine tumor population, both the mFI and MSTFI demonstrated signi cant heterogeneity and di culty in predicting postoperative AEs 34,62,63 . Initial validation by De la Garza Ramos et al suggested the MSTFI was an appropriate risk strati cation tool 62 . However, external validation by Massaad et al identi ed that the predicted outcomes strati ed by MSTFI severity were not consistent with those reported in the initial validation study 63 . The authors observed that the MSTFI overestimated the risk of postoperative AEs for severely frail patients while underestimating the risk for mildly frail patients 63 .
Bourassa-Moreau et al observed that neither the mFI nor MSTFI were associated or predictive of postoperative AEs 36 . Consequently, given the heterogeneity and inconsistency, no recommendation can be made as to whether the mFI or MSTFI are appropriate risk strati cation tools for this spine population. This highlights the challenge of de ning and quantifying frailty in the metastatic spine tumour population. Further efforts are required to improve the determination of frailty in this speci c surgical cohort.
Similarly, determining the most sensitive frailty tool for the primary spine tumour population is di cult. Our review observed that the mFI and PSTFI weakly predict postoperative AEs with large con dence estimates and relatively poor sensitivity. Additionally, patients with primary spine tumors are often younger and less likely to have comorbidities or present with clinical features of frailty. Consequently, comorbidity-based frailty tools such as the mFI or PSTFI are not sensitive for evaluating frailty within this population. Additionally, since the PSTFI is derived from the MSTFI, it is poorly sensitive for assessing frailty in the primary spinal tumour population.
As frailty tools, the construct of the MSTFI and PSTFI are not designed to evaluate frailty. The MSTFI and PSTFI contain surgical, radiographic, and laboratory items that are not sensitive or speci c to frailty. The limited number of de cits within these frailty tools is also problematic. It increases the potential for imprecise index estimates, and when applied to small healthy cohorts, the lack of de cits signi cantly reduces the ability to detect a relationship with adverse outcomes 36 . The cutoff values applied to stratify frailty severity were also chosen without any formal assessment. Finally, given the non-modi able constructs of these measures, the MSTFI and PSTFI are only applicable as risk strati cation tools. The need for medical imaging or extensive chart review may hinder these measures' feasibility due to extensive time requirements.
Similar to the mFI and the 5-item mFI, the FBS was constructed using commonly reported variables within the NSQIP database 82 . The FBS was initially validated as a risk strati cation tool for the vascular surgery population 82 . Medvedev et al further validated its use as a risk-strati cation tool in the surgical spine population to predict postoperative AEs 65 . However, the clinical applicability of the FBS and its most sensitive surgical spine population cannot be determined for several reasons. The FBS was validated in a heterogeneous cohort without any formal analysis adjusted for cervical pathology. Consequently, it is unknown whether the FBS is more sensitive to a subtype of cervical spine pathology. The FBS has also not been externally validated, raising concern about its validity as a risk strati cation tool. Finally, due to its non-modi able construct, the FBS is only applicable as a risk strati cation tool.
The modi ed frailty score (MFS) is a 19-item frailty index validated by Patel et al for predicting mortality in the orthogeriatric population 69, 83 . It was constructed by including 19 of the 70 de cits within the CSHA-FI 83 . The MFS is associated with higher rates of 30-day postoperative mortality following spine surgery for tuberculous spondylodiscitis 69 . However, no formal analysis was performed to evaluate its predictive validity, limiting its applicability as a risk strati cation tool. Many clinimetric properties of the MFS have also not been assessed. The 19 de cits included from the 70-item CHSA-FI were arbitrarily chosen without any formal analysis of convergent validity. Despite these limitations, the MFS assesses a greater number of frailty domains than other de cit accumulation measures reported in the surgical spine literature. Accordingly, the MFS is a more sensitive frailty tool in healthy populations and is less prone to instability and poor index estimates.
The Hospital Frailty Risk Score (HFRS) is a validated risk strati cation tool that incorporates administrative coding into the assessment of frailty. Initially constructed by Gilbert et al 84 , the HFRS contains 109-items health-de cits derived from International Classi cation of Disease -10 (ICD-10) codes collected upon admission to hospital. The HFRS can be calculated from routinely collected data within electronic medical records without the need for extensive chart review. The HFRS demonstrated to be a valid risk strati cation tool for predicting postoperative AEs following spine surgery for degenerative spine conditions. Similar studies validating the HFRS in non-spine surgical populations have demonstrated equivocal or superior ndings for the HFRS to predict postoperative AEs 85 . Given this, the HFRS is a sensitive risk strati cation tool in the degenerative spine population. However, the technological requirements needed to use the HFRS may limit its applicability.
As a frailty tool, the HFRS differs from traditional de cit accumulation tools reported in the literature. The HFRS is calculated from ICD-10 codes, which are individually scored based on the prevalence of the health de cit and individual association with adverse health outcomes. Accordingly, the HFRS is a more reliable and accurate tool as the estimated risk is adjusted for the health de cits that contribute to frailty. However, many of its clinimetric properties have not been formally assessed. Gilbert et al acknowledged di culties designing the HFRS from ICD-10 coded data as these health-de cits do not capture the multisystem and dynamic progression of frailty 84 . Consequently, the predictive abilities of the HFRS may be overstated compared to other frailty tools that capture the dynamic features of frailty such as functional states, phenotypic characteristics, caregiver support and uctuations in uenced by acute illnesses.
Additionally, given its design and primary application as a risk strati cation tool, its role as a frailty trajectory tool is signi cantly limited.
The Risk Analysis Index (RAI), constructed by Hall et al, is a 14-item questionnaire designed for assessing frailty in surgical patients 86 . It is recognized as a valid risk strati cation tool for predicting postoperative AEs and identifying patients requiring preoperative optimization within the elderly surgical population 87 . Within the surgical spine population, pre-frail and frail RAI scores were associated with adverse postoperative outcomes. However, multiple limitations are present within the validation study. Many of the postoperative outcomes studied occurred at an exceeding low frequency, likely creating a type 2 statistical error that underpowered the predictive validity of the RAI. A selection bias further compromises the validity of the RAI as Agarwal et al failed to report the number of patients with complete or missing RAI and outcome data 67 . Additionally, the statistical analysis did not adjust for confounding patient and operative variables. Given these limitations, no recommendation can be made regarding whether the RAI is a sensitive risk strati cation tool within the surgical spine population as further validation studies are needed.
Similar to the HFRS, the RAI differs from traditional frailty tools. Using prede ned criteria, the RAI assesses multiple frailty domains to create a weighted score representative of the patient's frailty state. The content of the RAI is more sensitive for assessing frailty as it is adapted from the previously validated Minimum Data Set (MDS) Mortality Risk Index-Revised (MMRI-R) 86 . Additionally, the RAI uses a de ned set of items and a standardized scoring system to eliminate potential inter-rater bias or error amongst users. As the RAI has only been recently investigated in the surgical spine population, many of its clinimetric properties remain unknown. Further investigation is ultimately warranted to determine its validity and reliability in the surgical spine population.
Lastly, given that the RAI is validated as a perioperative risk strati cation tool, its role as a frailty trajectory tool is limited despite a modi able construct.
Lastly, the Comprehensive Geriatric Assessment (CGA) tool assesses frailty based on a multidisciplinary approach for optimizing, coordinating and integrating geriatric care. The CGA evaluates the frailty domains of function, cognition, mood and mental health, nutrition, comorbidity status, polypharmacy, and social health using validated subscales. The CGA is validated as both a risk strati cation tool and an instrument for guiding preoperative optimization of frail patients 88 . Within the spine population, Chang et al recently validated the CGA as a risk strati cation tool for predicting postoperative AEs in elderly patients after lumbar spine surgery for degenerative disease 68 . Despite a relatively small population, the study had a robust study methodology with strict inclusion criteria to assess the predictive validity of the CGA. The components of the CGA also had de ned values for each frailty component evaluated from either the original articles or subsequent validation study 68 . However, the criterion to de ne frailty was chosen arbitrarily without formal sensitivity or construct validation. The sample population was also relatively heterogeneous, raising concern for type II error and a lack of statistical power. Despite these limitations, the CGA is a valid and sensitive risk strati cation tool for predicting postoperative AEs within the degenerative lumbar spine population.
As a frailty tool, the CGA is highly sensitive for assessing and quantify frailty. Given its construct, the CGA differs from previously discussed frailty tools that contain non-validated or arbitrary content to evaluate and de ne frailty. The CGA may be a valuable screening tool to help guide perioperative optimization of frail patients undergoing surgical intervention. CGA targeted optimization has improved functional outcomes and reduced mortality in the community and hospital-dwelling frail population 89, 90 . Furthermore, the CGA may be sensitive to capturing the relationship between spinal disease and frailty as it contains components susceptible to improvement following spine surgery. Despite these strengths, the CGA lacks standardized content, delivery, and interpretation, potentially limiting cross-population validity and reliability 91 . Further studies are warranted to establish its clinimetric properties and determine the validity, reliability, and responsiveness in the surgical spine population.

Frailty Trajectory Tools
The FRAIL (fatigue, resistance, ambulation, illness, and weight loss) Scale is a validated ve-item frailty tool developed by the International Academy on Nutrition and Ageing Task Force 92,93 . The conceptual foundations are heavily rooted in the phenotypic frailty model as four of the items (fatigue, resistance, ambulation, and weight loss) are derived from it. Validated cutoff values are used to stratify scores into robust, pre-frail, and frail patients. Since its conceptualization, the FRAIL Scale has proven to be a reliable and valid frailty tool for identifying elderly patients at an increased risk of adverse health outcomes 94 . Based on our review, the FRAIL Scale predicted a lower likelihood of postoperative functional return and a higher risk of postoperative delirium in patients undergoing elective spine surgery for degenerative disease. These ndings are important considering spine surgery aims to improve functional outcomes back to baseline or surpass them. Failure to return to, or surpass baseline function is concerning as spine surgery is associated with signi cant risks. Given this, the FRAIL Scale may be a valuable tool in the decision-making process to identify patients requiring timely surgical intervention or preoperative optimization.
The Fried Phenotype is a ve-item frailty tool developed by Fried et al 6 . Constructed and validated by Fried et al, the tool assesses ve items including weight loss, weakness (strength), exhaustion (endurance), slowness (gait speed), and low physical activity (kilocalories) 6 . Validated cut off-values are used to stratify scores into robust, pre-frail, and frail patients 6 . Since its initial validation, the Fried Phenotype is recognized as a reliable, valid, diagnostic, and sensitive assessment tool for identifying frail patients at an increased risk of early disability, morbidity, and mortality 70,95 . Interestingly, our review identi ed that the Fried Phenotype did not predict postoperative AEs within the thoracolumbar population undergoing elective spine surgery for degenerative or deformity spine conditions. This may have been due to several factors. Firstly, the cohort size of the validation population was relatively small, therefore increasing the risk of potential bias' and reducing the statistical power of the risk estimates. The relationship between the Fried Phenotype and postoperative AEs may have also been confounded by the Timed Up and Go (TUG) test. As a test of physical impairment, the TUG inherently captures phenotypic elements of frailty, therefore confounding the relationship between the Fried phenotype and postoperative AEs.
Of the frailty tools identi ed in our review, the FRAIL Scale and Fried Phenotype are the most sensitive for capturing the impact of spinal pathology and surgical intervention on frailty trajectory. The underpinning phenotypic construct overlaps with those clinical features of disability and weakness associated with spinal disease 13 . Given this, if spine surgery aims to improve functional outcomes, the modi able construct of the Fried Phenotype and FRAIL Scale are sensitive to capturing changes in frailty trajectory. Although this relationship has not been studied in spine literature, both the Fried Phenotype and FRAIL Scale have been observed as responsive tools for capturing changes in frailty trajectory 72 .
The FRAIL Scale and Fried Phenotype are also potentially useful assessment tools for screening and tracking responsiveness to frailty targeted preoperative rehabilitation 96 . Over the past several years, prehabilitation has gained popularity in the literature as a means of preoperatively optimizing patients' health to improve postoperative outcomes 97 . Rudimentary in their composition, mode of administration and outcome measure, preliminary evidence suggests these programs may reduce the risk of postoperative AEs 97 . Although no program has been described in the spine literature, tailored preoperative physiotherapy improves and maintains postoperative functional outcomes in patients with degenerative lumbar spine disorders 98 . Considering the relationship between degenerative lumbar disease and frailty, preoperative optimization of frailty may be critical in improving outcomes following spine surgery.
Though, developing a frailty-targeted prehabilitation program is challenging due to the uniqueness of health-de cits to each patient. The CGA may overcome this challenge as it is a powerful screening tool for identifying health de cits susceptible to optimization and tailoring multidisciplinary interventions. Initial studies investigating CGA and frailty targeted prehabilitation with nutrition and exercise interventions have found mild phenotypic and functional improvements in hospitalized and community-dwelling geriatric patients 88, 90, 99, 100 . However, it is unknown whether these improvements signi cantly reduce adverse outcomes, especially in the surgical context. Studies are ultimately needed to determine the most effective method of identifying susceptible healthde cits and clarifying the composition, mode of administration, and clinical e cacy of prehabilitation programs.

Future Directions
Since the assessment of frailty in the surgical spine setting may be important in the clinical decision-making process, we must be con dent that the assessment tools used are sensitive, reliable, and validated. The evaluation of clinimetric properties is also essential as it clari es what constitutes a good clinical measure. Most of the frailty tools identi ed in this review lacked prospective external validation and formal evaluation of clinimetric properties. Future studies should focus on the prospective validation of these frailty tools to rea rm their validity and applicability as reliable risk strati cation tools.
Prospective studies are also needed to determine the validity of other well-established frailty tools for predicting postoperative AEs. Measures such as the Clinical Frailty Scale (CFS) or the Edmonton Frail Scale (EFS) are validated risk strati cation tools for predicting postoperative AEs among geriatric patients undergoing major elective surgery 101,102 .
Further studies are also needed to investigate the relationship between spine disease, surgical intervention, and frailty. Given that symptomatic spine disease is a risk factor for frailty, timely spine surgery may be an effective intervention to reverse frailty and reduce adverse health outcomes. Inversely, if spinal disease incurs a greater risk of frailty, the likelihood of adverse health outcomes is inherently increased for patients waiting for spine surgery. Validating this relationship with phenotypic tools may better identify patients requiring timely surgery.
Unfortunately, limited evidence has investigated the concept of frailty reversibility. Consequently, it is known whether a speci c threshold of reversible health de cits is required to signi cantly reduce the risk of adverse health outcomes. It is also unknown whether a therapeutic limit exists whereby the number of reversible frailty de cits becomes saturated and no longer imparts a reduction in the risk of long-term mortality and disability. Finally, it is unclear if the concept of frailty reversibility is validated for speci c operational de nitions of frailty. Future studies are needed to investigate these concerns and determine whether prehabilitation and spine surgery are effective interventions in reversing frailty and reducing adverse health outcomes in patients with spinal disease.

Strengths and Limitations
This review contains several strengths and limitations. The literature search used a broad search terminology to identify all possible studies reporting a frailty tool in the surgical spine population. The use of two independent reviewers during the literature search and study identi cation phases reduces the likelihood of possible biases such as selection bias, publication bias, and competing interests. The approach to tool evaluation was also structured and well de ned. A validated set of qualitative appraisal criteria was used to evaluate and transparently report the clinimetric properties for each frailty tool. Lastly, the recommendations suggested were determined by panel consensus. This methodology reduces any potential biases or competing interests.
Despite these strengths, our review has several limitations. Some of the de nitions utilized in this study, especially those on feasibility, applicability, and objectivity, have not necessarily been validated. To reduce bias and subjectivity, we identi ed previously published de nitions as a guide for formulating the criteria used in this review. Use of search limitations such as "English Language" and "Full-text only" may also reduce the scope of articles we could capture.
Consequently, this review may under-report the frailty tools currently studied within the surgical spine literature.
Another limitation was the inability to include frailty tools reported in patient populations with neurological features similar to the surgical spine population.
Inclusion of such articles would have allowed us to appraise a greater range of frailty tools. However, the studies identi ed during the initial design of this review de ned populations by underlying diagnosis, not clinical features. This resulted in study populations with heterogeneous neurological features that are not comparable or relevant to those clinical features within the surgical spine population. Given this poor cross-population comparability, the methodology of this review was re-drafted to exclude these studies as it would have reduced the strength of our analysis and recommendations.

Conclusion
Frailty measures within surgical spine practice are important tools in the surgical-decision making process regarding risk strati cation, timely surgical intervention, and prehabilitation. Fourteen frailty tools were identi ed across the surgical spine literature, with most validated as risk strati cation tools for predicting postoperative AEs. Although most measures were feasible and objective, many lacked assessment of multiple clinimetric properties. Instruments derived from the cumulative de cit and weighted frailty models containing non-modi able constructs are the most appropriate risk strati cation tools. Phenotypic frailty tools are the most sensitive for capturing the relationship between spinal disease, spine surgery, and prehabilitation on frailty trajectory. The CGA is an effective screening instrument for identifying health-de cits susceptible to improvement through tailored preoperative optimization programs.
Studies are needed to investigate whether spine surgery or prehabilitation are effective interventions in reversing frailty, improving longitudinal health outcomes and reducing the risk of postoperative AEs in patients with spine disease. Finally, studies are needed to formally evaluate the clinimetric properties of the frailty tools within the surgical spine population. Declarations 1. Ethics Approval and Consent to Participation Not Applicable.

Consent for Publication
Not Applicable.

Availability of Data and Materials
All relevant data extracted from the studies included within the review can be found in Appendix-2, MoskvenSupplementalData2.docx.

Competing Interests
The authors declare that they have no competing interests.

Funding
The authors declare that no funding was received during the design, collection, analysis and interpretation of data and writing of the manuscript.
. Author's Contributions EM was involved in the conception and design; acquisition, analysis, and interpretation of data; and drafting of the manuscript. R.C-M was involved in the conception and design; acquisition, analysis, and interpretation of data; and drafting of the manuscript. AMF was involved in the conception and design; drafting of the manuscript; and critical revision of the manuscript for important intellectual content. JTS was involved in the conception and design; supervision; and critical revision of the manuscript for important intellectual content. All authors read and approved the nal manuscript for submission.

Acknowledgements
Not applicable.