From an initial 2058 records, 95 were retrieved for full-text review, and 24 were selected (Figure 1). 71 records were excluded: 44 did not include constructs relating falls-related self-efficacy or balance confidence, 11 assessed other measurement properties, six did not assess measurement properties, two were conducted on different populations, two were abstracts, one was a thesis, one was in citation and four were written in other languages (i.e. Persian, German, Dutch). 35 records were included: 24 full-text articles met eligibility criteria and 11 additional articles from citation tracking, were used to evaluate instrument development (16 studies), content validity (33 studies) and structural validity (14 studies).
Content validity
Quality of instrument development studies
A summary of the studies detailing construct definition, target population, and the intended context of use for the 18 instruments was presented (see Additional file 3). Nine studies were related to scales measuring falls efficacy. Four studies were related to the construct of balance confidence. Three studies were related to scales with the title relating to falls efficacy; however the studies measured concerns about falling rather than constructs relating to falls efficacy or balance confidence.
Concept elicitation was identified as inadequate for 15 instruments because no target population had been involved in their development. For the other instruments (i.e. ABC-16, CONFBal and Mobility Efficacy Scale (MES)), concept elicitation was doubtful because of unclear methods. Among all studies relating to an instrument’s development, only Icon-FES featured cognitive interviews with older adults. However, the quality of cognitive interviews was doubtful because the characteristics of the sample population and methodology of the interview process were not described.
Quality and results of content validity studies
47 studies were reviewed for content validity of the instruments. 34 studies had involved a target population (see Additional file 4), with 13 studies involving professionals (see Additional file 5). There were no studies on the content validity of Gait efficacy scale (GES)-8 found. Among all instruments, ABC-16 had the highest number of 18 studies conducted that involved older adults (32%) and professionals (54%) respectively. For scales involving the target population in assessing content validity, only one study (MFES-13) was of adequate quality to assess its relevance, comprehensibility and comprehensiveness. Two studies on relevance (FES-10 and ABC-6) were of inadequate quality, and one study on comprehensibility (FES-10) was of inadequate quality. Fifteen content validity studies involving target populations were cross-cultural adaptations that included a pretest of the translated instruments. In these studies, 6 studies assessing relevance were of doubtful quality, while 6 studies assessing comprehensibility were also of doubtful quality. All other studies were of either inadequate or indeterminate quality. None of the studies assessed comprehensiveness adequately. A significant number of content validity studies involving patients (44%) were cross-cultural adaptations that included a pre-test of the translated instruments (FES-10, MFES-13, MFES-14, ABC-6, ABC-16) with the largest number of studies on ABC-16 (60%). These studies were of doubtful (47%), inadequate (13%) or indeterminate (40%) quality.
Out of the 13 content validity studies involving professionals, 10 were cross-cultural adaptation studies. Two studies on the original instruments explored the relevance of the FES-10 and the comprehensiveness of the Icon-FES. However, both were of doubtful quality (7, 13). All studies that had included cross-cultural adaptation research involving 6 instruments (FES-10, MFES-13, MFES-14, ABC-15, ABC-16, Icon-FES), were of doubtful or indeterminate quality.
Evidence synthesis for falls efficacy scales
Among all instruments evaluating falls efficacy, MFES-13 had high quality evidence demonstrating sufficient results for relevance (based on one adequate quality study and reviewers’ rating) (32), and insufficient results for comprehensiveness (based on one adequate quality study and reviewers’ rating) (32). Moderate quality evidence was only available for FES-10, which had sufficient results for relevance (based on one doubtful quality study); MFES-13, which had inconsistent results for comprehensibility (based on one adequate quality study and one doubtful quality study); and MFES-14, which had sufficient results for comprehensibility (based on two doubtful quality studies) (32-35). For all other related instruments measuring falls efficacy, evidence quality had been generally low to very low (see Additional file 7). There had been no relevant studies of content validity studies and related studies were of inadequate quality based on reviewers’ ratings.
Evidence synthesis for balance confidence scales
Among all instruments evaluating balance confidence, moderate quality evidence was only available for the ABC-15. It displayed sufficient results for relevance (based on one content validity study of doubtful quality) (23). However, insufficient results for comprehensiveness and sufficient results for comprehensibility were supported by very low quality evidence. Similarly, for instruments measuring balance confidence, evidence quality had been generally low to very low (see Additional file 7). There had been no relevant studies of content validity studies and based on reviewers’ ratings, even related studies had shown inadequate quality.
Evidence synthesis for scales with titles relating to falls efficacy
Three scales with titles relating to falls efficacy, Icon-FES, FES-I and MES were developed to measure fear of falling and/or concerns about falling (12, 13, 36). The Icon-FES was the only scale to have been underpinned by moderate-quality evidence to display sufficient results for relevance and comprehensiveness (based on one doubtful quality study) (13). Other assessments for Icon-FES, FES-I and MES were rated as low to very low by reviewers given the absence of quality within any relevant studies of content validity.
Structural validity
Quality and results of studies
A total of 14 studies (see Additional file 6) assessed structural validity of falls-related self-efficacy (4 studies) (4, 9, 35), balance confidence (8 studies) (23, 37-42) and falls efficacy related titled scales (2 studies) (12, 13). The majority of authors used exploratory factor analysis (EFA, 72%) (4, 9, 12, 35, 37, 38, 41-43). The other studies used IRT Rasch model (7%) (39), IRT polytomous model (7%) (23) or more than a method of analysis (14%) (13, 40). 93% of the studies were of at least adequate quality, 64% were of high quality and 29% were of adequate quality. Only one study was of inadequate quality, because an insufficient sample size had been used for analysis (38).
Evidence synthesis
All studies on FES-10, MFES-14, ABC-6, ABC-15, ABC-16, Icon-FES, FES-I and PAPMFR reported positive results, and provided high-quality evidence of sufficient unidimensionality. All the other instruments displayed indeterminate ratings.