From an initial 2058 records, 95 were retrieved for full-text review, and 24 were selected (Figure 1). 71 records were excluded: 44 did not include constructs relating falls-related self-efficacy or balance confidence, 11 assessed other measurement properties, six did not assess measurement properties, two were conducted on different populations, two were abstracts, one was a thesis, one was in citation and four were written in other languages (i.e. Persian, German, Dutch). 35 included records: 24 full-text articles meeting eligibility criteria and 11 additional articles from citation tracking, were used to evaluate instrument development (16 studies), content validity (33 studies) and structural validity (14 studies).
Content validity
Quality of instrument development studies
A summary of the studies detailed construct definition, target population, and the intended context of use for the 18 instruments was presented (see Additional file 3). Nine studies were related to scales measuring falls efficacy. Four studies were related to the construct of balance confidence. Three studies were related to scales with the title relating to falls efficacy but measured concerns about falling than constructs relating to falls efficacy or balance confidence.
Concept elicitation was identified as inadequate for 15 instruments because no target population had been involved in their development. For the other instruments (i.e. ABC-16, CONFBal and Mobility Efficacy Scale (MES)), concept elicitation was doubtful because of unclear methods. Among all studies relating to an instrument’s development, only Icon-FES featured cognitive interviews with older adults. However, the quality of cognitive interviews was doubtful because older adults’ characteristics and methodology of the interview process were not described.
Quality and results of content validity studies
47 studies were reviewed for content validity of the instruments. 34 studies had involved a target population (see Additional file 4), with 13 studies involving professionals (see Additional file 5). No studies on the content validity of Gait efficacy scale (GES)-8 were found. Among all instruments, ABC-16 had the highest number of 18 studies conducted involving older adults (32%) and professionals (54%) respectively. For those scales involving the target population in assessing content validity, only one study (MFES-13) was of adequate quality to assess its relevance, comprehensibility and comprehensiveness. Two studies on relevance (FES-10 and ABC-6) were of inadequate quality, one study on comprehensibility (FES-10) was of inadequate quality. Fifteen content validity studies involving target populations were cross-cultural adaptations that included a pretest of the translated instruments. In these studies, 6 studies assessing relevance were of doubtful quality and 6 studies assessing comprehensibility were also of doubtful quality. All other studies were of either inadequate or indeterminate quality. None of the studies assessed comprehensiveness adequately. A significant number of content validity studies involving patients (44%) were cross-cultural adaptations that included a pre-test of the translated instruments (FES-10, MFES-13, MFES-14, ABC-6, ABC-16) with the largest number of studies on ABC-16 (60%). These studies were of doubtful (47%), inadequate (13%) or indeterminate (40%) quality.
Out of the 13 content validity studies involving professionals, 10 were cross-cultural adaptation studies. Two studies on the original instruments explored the relevance of the FES-10 and the comprehensiveness of the Icon-FES but both were of doubtful quality (7, 13). All studies that had included cross-cultural adaptation research involving 6 instruments (FES-10, MFES-13, MFES-14, ABC-15, ABC-16, Icon-FES), were of doubtful or indeterminate quality.
Evidence synthesis for falls efficacy scales
Among all instruments evaluating falls efficacy, MFES-13 had high quality evidence demonstrating sufficient results for relevance (based on one adequate quality study and reviewers’ rating) (33), insufficient results for comprehensiveness (based on one adequate quality study and reviewers’ rating) (33). Moderate quality evidence was only available for FES-10 which had sufficient results for relevance (based on one doubtful quality study), MFES-13 which had inconsistent results for comprehensibility (based on one adequate quality study and one doubtful quality study) and MFES-14 which had sufficient results for comprehensibility (based on two doubtful quality studies) (33-36). For all other related instruments measuring falls efficacy, evidence quality had been generally low to very low (see Additional file 7). There had been no relevant studies of content validity studies and related studies were of inadequate quality based on reviewers’ ratings.
Evidence synthesis for balance confidence scales
Among all instruments evaluating balance confidence, moderate quality evidence was only available for the ABC-15. It displayed sufficient results for relevance (based on one content validity study of doubtful quality) (22). However, insufficient results for comprehensiveness and sufficient results for comprehensibility were supported by very low quality evidence. Similarly, for instruments measuring balance confidence, evidence quality had been generally low to very low (see Additional file 7). There had been no relevant studies of content validity studies and based on reviewers’ ratings, even related studies had shown inadequate quality.
Evidence synthesis for scales with titles relating to falls efficacy
Three scales with titles relating to falls efficacy, Icon-FES, FES-I and MES were developed to measure fear of falling or concerns about falling (12, 13, 37). The Icon-FES was the only scale to have been underpinned by moderate-quality evidence to display sufficient results for relevance and comprehensiveness (based on one doubtful quality study) (13). Other assessments for Icon-FES, FES-I and MES were rated as low to very low by reviewers given the absence of quality within any relevant studies of content validity.
Structural validity
Quality and results of studies
A total of 14 studies (see Additional file 6) assessed structural validity of falls-related self-efficacy (4 studies) (4, 9, 36), balance confidence (8 studies) (22, 38-43) and falls efficacy related titled scales (2 studies) (12, 13). Most authors used exploratory factor analysis (EFA, 72%) (4, 9, 12, 36, 38, 39, 42-44). The remainder used IRT Rasch model (7%) (40), IRT polytomous model (7%) (22) or more than a method of analysis (14%) (13, 41). Most of the studies (93% were of at least adequate quality (69%, high quality and 31%, adequate quality). Only one study was of inadequate quality because an insufficient sample size had been included for analysis (39).
Evidence synthesis
All studies on FES-10, MFES-14, ABC-6, ABC-15, ABC-16, Icon-FES, FES-I and PAPMFR reported positive results, providing high-quality evidence of sufficient unidimensionality. All the other instruments displayed indeterminate ratings.