Phase 1: Tool development
We used two sequential stages to develop the Quality Indicators for Physiotherapy Management of Hip and Knee Osteoarthritis (QUIPA) tool: 1) drafting of patient-reported QIs based on clinical guideline recommendations most relevant to physiotherapy practice identified from a recent consensus study  and the UK-QI questionnaire  and 2) refinement of the language and format of the QUIPA tool to ensure it was consumer friendly.
The research members involved in this study included physiotherapists who are also experts in OA research (KB, RH, TE, KD) and QI development and implementation (KD). KD has extensive experience in QI development and use for implementation of National Institute for Health and Care Excellence Quality Standards through clinical tools and patient questionnaires. KD was involved in the UK-QI study  which included patient and public involvement and engagement, as experts by experience.
Stage 1 – drafting of patient-reported quality indicators for physiotherapy care
Draft QIs were derived from a final list of clinical guideline recommendations for hip and/or knee OA proposed by a recent consensus study as being most relevant to physiotherapy care . The study first extracted recommendations from two high-quality clinical guidelines [1, 29] and then included a panel of 62 international physiotherapists to complete an online modified-Delphi survey, followed by a priority-ranking exercise in order to identify and rank recommendations most relevant for physiotherapy practice. The final 30 recommendations were then synthesized and grouped by content area to convey a physiotherapy management for hip and/or knee OA. A conceptual model based on the results of the study  was used when developing the QUIPA tool. The four main content areas of the final recommendations were condensed to form the three subscales of the QUIPA tool. We aimed to develop a QI relevant to each of the 30 recommendations on the final list, whilst minimising redundancy across items. Thus, where recommendations were similar, we only developed a QI based on the highest ranked recommendation . We did not develop a QI for recommendations if it was deemed by the research experts as difficult to assess in a physiotherapy consultation; captured in another individual QI; related to a health service program instead of an individual treatment or unable to be executed by a physiotherapist (e.g. referring patients for joint surgery). Where the recommendations overlapped with those in the UK-QI questionnaire , we utilised similar phrasing as the UK-QI questionnaire because it had been through a rigorous development process, involved patient participation and was based on the most recent QIs, both from the Norwegian patient-reported QI questionnaire  and the systematic review in 2013 . Although the Norwegian team has since revised and validated their QI questionnaire , it contains similar QIs to that of the previous version. The first draft of the QUIPA tool is attached in Additional file 1.
Stage 2 – Refinement of the language and format of the QUIPA tool
Patient and public involvement
A convenience sample of 15 people with hip and/or knee OA living in Melbourne, Australia were recruited from our research database and via Facebook to participate in one of three face-to-face focus groups to further refine the QUIPA tool. Inclusion criteria were: i) aged 45 years or above, ii) being told they had OA in their hip and/or knee by a health professional, iii) saw a physiotherapist for their hip and/or knee OA over the last 3 months, and iv) able to attend the University for allocated session date/time. Ethical approval was granted by the School of Health Sciences Human Ethics Advisory Group, University of Melbourne (Ethics Application 1750532).
Each focus group session ran for 90 minutes and was moderated by a research team member and an assistant. Sessions were audio-recorded. Participants firstly completed a questionnaire about demographics as well as hip/knee pain and function. They were then presented with the draft QUIPA tool and asked to explain what they understood each QI item meant to ensure consistency with its original intent, a technique known as cognitive debriefing . They were also asked to comment on wording clarity. The QUIPA tool was projected onto a presentation screen to allow the research assistant to alter the wording of the QIs in real time during the group session. Participants were also asked to comment on the appropriateness of the tool response scale and its overall format and layout [22, 32]. The research team revised and reworded the QUIPA tool following each focus group session before presenting the revised version to the subsequent group. Table 1 represents the final version of the QUIPA tool.
Phase 2: Clinimetric evaluation of the QUIPA tool
The evaluation study was performed between August and December 2018. Participants with hip and/or knee OA were recruited to attend a single one-on-one consultation with a designated study physiotherapist for assessment and treatment of their affected joint(s). They were then required to complete the QUIPA tool online at three time points: one week (W1), twelve weeks (W12) and thirteen weeks (W13) after their consultation. A three-month recall period was selected for the QUIPA tool to capture either single or multi-session episodes of physiotherapy care and has been utilised in other comparable tools [22, 23]. For the purpose of this study, participants were asked not to have any further physiotherapy consultations for their affected hip and/or knee joint(s) during the thirteen weeks to avoid treatment confusion. For the purpose of this clinimetric evaluation, we also established a physiotherapist version of the QUIPA tool, which contained the same items but worded from the physiotherapists’ perspective. Physiotherapists completed the tool immediately post-consultation (W0). Ethical approval was granted by the School of Health Sciences Human Ethics Advisory Group, University of Melbourne (Ethics Application 1750925).
To evaluate patient test-retest reliability, we examined the participant responses between W12 and W13. We used three a priori hypotheses to assess construct validity. The hypotheses reflected anticipated response patterns among contrasting subgroups in relation to body mass index (BMI), pain level with walking and daily functional ability . Criterion validity was determined by assessing agreement between physiotherapists and participants at W1. We defined responses from the physiotherapists as ‘gold standard’ as we expected their responses to be the most accurate compared to the participants since they completed the tool immediately after the consultation session and knew what treatment they had administered.
A convenience sample of adults aged 45 years or over with self-reported hip and/or knee OA were recruited from the CHESM research database and by advertisements on Facebook. We aimed for a minimum of 50 people to participate in the clinimetric study because this sample size is the minimum recommended for any health questionnaire validity and/or reliability study . The proposed minimum sample size allowed for a broad cross-sectional representation of people with hip and/or knee OA, including ages, genders and OA severity.
Participants were required to meet the National Institute for Health and Care Excellence OA clinical criteria: i) aged 45 years or above ii) have activity-related hip and/or knee pain and 3) have no more than 30 minutes of morning stiffness in their hip and/or knee. Participants were excluded if they had inflammatory arthritis, had undergone hip/knee replacement surgery for the affected hip/knee(s), planned to see another physiotherapist within thirteen weeks and/or were unable to give consent, attend an appointment with one of the study physiotherapists or to complete the questionnaires online at the specified time points.
We recruited nine physiotherapists currently registered to practise in Australia and working in private practice settings within Melbourne to ensure geographical spread around Melbourne for participants’ convenience.
Participants received one consultation from their designated study physiotherapist at no cost to themselves. In order to increase variability in the care provided within a standard 30-minutes consultation, physiotherapists were provided with different cue cards that contained specific tasks/treatments they were requested to do, or not do, with the participants. Participants were informed that the physiotherapists were going to provide a range of different treatments to different participants, and thus individual participants did not have any pre-conceived ideas about what they would or would not receive. Participants were emailed a link to the online QUIPA tool at one, twelve and thirteen weeks following their physiotherapy session and were asked to complete the tool as soon as they could. With the W1 QUIPA tool, participants were also asked to provide information about demographic, other medical conditions, height and weight as well as to complete the pain and function subscales of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). Participants were asked whether they had seen another physiotherapist each time they completed the QUIPA tool. Reminder emails and text messages were sent to non-responders daily (up to three times) after responses were due. To maximise completion of surveys , those who completed all three were entered into a draw to win a $100 gift card.
Physiotherapists were asked to complete the QUIPA tool online immediately following each consultation. Physiotherapists were reimbursed $60 for each participant they saw.
Test-retest reliability of the QUIPA tool for participants was determined by comparing their responses between W12 (+/-7 days) and W13 (-2 /+7 days). Test-retest reliability for individual QI items was assessed by calculating Cohen’s Kappa (95% confidence intervals CI), percentage of observed agreement (i.e. the percentage of occasions when the answer was identical between W12 and W13), and percentage of expected agreement (i.e. the percentage of occasions when the answer was expected by chance to be identical between W12 and W13). Cohen’s Kappa compares the expected agreement to that observed. Kappa values were interpreted according to Landis and Koch : 0-0.20 slight; 0.21-0.40 fair; 0.41-0.60 moderate; 0.61-0.80 substantial and 0.81-1.00 almost perfect reliability.
Test-retest for each QUIPA subscale and the total score was assessed with intraclass correlation coefficients (ICC) (95% CI) estimated using a two-way mixed effect model. An ICC of ≥0.70 was considered acceptable .
Construct validity was assessed with three predefined hypotheses. We first hypothesized that people responding ‘not overweight’ for the QI on benefits of losing weight (item #13a) would self-report lower BMI compared to those responding ‘yes’, ‘no’ or ‘don’t remember’. We also hypothesized that people responding ‘no such problems’ for the QIs on the walking aid item (#14) and the appliances and aids item (#15) would report no difficulty with walking and score lower for total physical function score on the WOMAC respectively compared to those responding ‘yes’, ‘no’ or ‘don’t remember’. Chi-square tests were used to test the first and second hypotheses and a t-test was used for the third hypothesis. The p-value cut off for statistical significance was ≤0.05 for both statistical tests. Validity was considered acceptable if ≥ 75% of the predefined hypotheses were confirmed .
Criterion validity of the QUIPA tool was determined by assessing agreement between physiotherapists and participants at W1 on individual items, each subscale and the total score of the QUIPA tool. To assess agreement for individual QI items, Cohen’s Kappa (95% CI), the percentage of observed agreement and percentage of expected agreement between physiotherapists and patients were calculated. Agreement for each subscale and the total score was assessed with an ICC (95% CI) estimated using a two-way mixed effect model.
Pass rates for individual QIs
The pass rate (%) for each QI was calculated based on responses from physiotherapists and patients at Week 1, where the numerator represented the total of ‘yes’ answers for the QI and the denominator was the total of ‘yes’ and ‘no’ answers for the QI. The denominator did not include other response options as they were deemed not relevant to a calculation of pass rate.