Quality of primary care from the patient’s point of view. A systematic review. There is currently no acceptable tool for practice or research purposes.

Background To run an inventory of instruments which assess quality of care from the patients’ experiences in primary care, in the context of multi-disciplinary health-care centres and which appraise their measurement quality, taking into account the methodological quality of their validation studies. Method Systematic review using Medline, Pascal, PsycINFO, Google Scholar, Cochrane, Scopus and CAIRN. For each instrument identied, the level of evidence was assessed using the COnsensus-based Standards for the selection of health status Measurement Instruments (COSMIN) checklist; the appraisal of the psychometric quality of the measurement property using three possible quality scores and the best-evidence synthesis based on the number of studies, their methodological and psychometric quality, and the direction and consistency of the results. Details of the subscales used to capture patients’ experiences of primary care were extracted and synthesized by grouping them into 9 dimensions dened by the Institute Of Medicine (IOM). Results 29 articles describing 29 instruments were found. Constructs captured by the assessment tools included illustrated a diversity of conceptualizations of patients’ experiences of primary care. There was no clear consensus across the assessment tools included about what patient experience in primary care ought to measure. There is an overall lack of evidence of their measurement quality, either because validation is missing or because methods are poor. Conclusion Due to the lack of evidence, the choice for the most appropriate instrument is dicult. Improvement and validation of existing instruments, and the use of COSMIN-guidelines could help make evaluations more effective. public

Involvement of family and friends: This dimension of patient-centered care focuses on accommodating family and friends on whom patients may rely, involving them, as appropriate, in decision making, supporting them as caregivers, making them welcome and comfortable in care and recognizing their needs and contributions.
Timeliness: Timeliness is an important characteristic of any service and is a legitimate and valued focus of improvement in health care. However, long waits are the rule rather than the exception. In addition to emotional distress, physical harm may result, for example, from a delay in diagnosis or treatment that results in preventable complications. E ciency: In an e cient health care system, resources are used to get the best value for money. It is also true for most improvements in safety, which result in fewer injuries, continually reduce the burden of illness, injury, and disability, and improve the health and functioning of the people Equity/accessibility: The aim of equity is to secure the bene ts for all patients. Equity in care implies universal access.
In the face of growing interest in patients' perceptions of their care, many researchers are interested in patient satisfaction and experience. Satisfaction is the patient's appreciation of the care received (9). Satisfaction surveys show an optimistic and limited picture of care (11). They do not really assess what has to be improved (12). Faced with this, researchers became more interested in the patient's experience (13).
Patient Reported Outcomes Measures (PROMs) are tools used to measure patient-reported outcomes (14). PROMs are standardized, validated questionnaires that are completed by patients during the perioperative period to ascertain perceptions of their health status, perceived level of impairment, disability, and health-related quality of life. They allow the e cacy of a clinical intervention to be measured from the patients' perspective. Questionnaires are given to patients both pre-and post-operatively to allow comparison of outcomes pre-and post-procedure. In addition to outcomes relating to interventions, PROMs measure patients' perceptions of their general health or their health in relation to a speci c disease. PROMs are a means of measuring clinical effectiveness and safety.
In parallel, Patient Reported Experience Measures (PREMs) gather information on patients' views of their experience whilst receiving care (14). They are an indicator of the quality of patient care, although do not measure it directly. PREMs are most commonly in the form of questionnaires. In contrast to PROMs, PREMs do not look at the outcomes of care but the impact of the process of the care on the patient's experience e.g. communication and timeliness of assistance. They differ from satisfaction surveys by reporting objective patient experiences, removing the ability to report subjective views.
In the context of multi-morbidity, the two measures are intertwined, and the distinction has little signi cance (14).
The USA were the rst to conduct a standardized national survey of patient care assessment: The Hospital Consumer Assessment of Health Care Providers and Systems (HCAHPS) (15). In the UK, the UK government launched a survey of primary care patients in 2006 (16). The National Health Service (UK) is now encouraging local surveys in consultation with patients, in addition to national surveys (17). Canada (18), is also experimenting with patient surveys. No data exists in France in the context of primary care.
Measuring patient perception of their experience of primary care, in its fully multidimensional sense, requires a robust instrument of measurement. Reviews have shown that many instruments have been developed over time (19,20). These existing reviews have not systematically appraised the measurement properties of the instruments found. Therefore, a systematic review was needed to identify the instruments which measured quality of primary care from the patient's perspective and also evaluated the measurement properties of these instruments.

Methods
To minimize potential sources of bias, this systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.

Search Strategy
We conducted a systematic literature search in Medline, Pascal, Cockrane, Scopus, Cairn, PsycINFO and Google Scholar between 1990 and November 2018. 1990 was chosen as the start date as multi-professional practice only emerged as the main practice model in that year. It would therefore be very unlikely that relevant instruments developed before 1990 would explore multi-professional domains for quality of primary care. Different Medical Subject Headings (MeSH) and keywords in four domains were used, including: "questionnaire", "patient satisfaction", "patient experience", "primary health care", with the help of the Health library of Brest University. The search was restricted to English or French Language articles. Reference lists were screened to identify additional relevant studies. The search strategy for Medline can be found in an additional le [see additional le 1].

Selection of eligible articles
The search aimed to include all articles that described the development or evaluation of instruments that measured quality of primary care. Articles that evaluated instruments which measured other constructs (such as quality of life, health status, preferences…) were not included. The inclusion criteria are presented in Table 1.
Titles and abstracts were independently screened by two researchers (JD and TP). If the title and abstract did not clearly indicate whether the inclusion criteria had been met, full-text was obtained and reviewed by the same researchers. Where necessary, a third reviewer (JYLR) was consulted for a nal decision. The bibliography of each article included was then checked, while following the same inclusion process, to add their speci c bibliography to the pool of articles included.

Data extraction
For each article included we extracted data on the methods and results for measurement properties and interpretability (see Table 2). In the case of an article describing the evaluation of multiple instruments, the data extraction was performed separately for each instrument investigated.
For each instrument identi ed within the articles included, the data extracted was as follows: i. The measurement characteristics, i.e. underlying measurement model, number of subscales and items, response scale, and score range, ii. Dimensions for quality of care: for each questionnaire, we identi ed which dimensions for quality of care were measured, and iii. Measurement properties.
The measurement properties within each questionnaire are categorized within three domains, according to the COSMIN taxonomy: (1) reliability (including internal consistency, reliability and measurement error), (2) validity (including content validity, structural validity, criterion validity, cross-cultural validity and hypothesis testing (construct validity)), and (3) responsiveness. For each article included, the data was extracted by one team member (TP) and checked by a second project team member (JD); differences of opinion between these two were discussed until consensus was reached. Where there was any doubt, a third researcher was consulted (JYLR). In addition, interpretability was also described. Interpretability is the degree to which one can assign qualitative meaning to quantitative scores. This means that investigators should provide information about clinically meaningful differences in scores between subgroups, oor and ceiling effects, and the minimally important change.

Quality appraisal
For each instrument, the quality of nine measurement properties and interpretability were appraised (see Table 2) and described in the validation studies in two ways: rst, the quality of the methods used to evaluate the measurement properties of the instrument (from here on referred to as the appraisal of methodological quality) was appraised; secondly, the measurement properties, based on the results of the validation studies, were appraised. Data from these two appraisals were combined to provide a best-evidence synthesis of the measurement properties for each instrument included.

Appraisal of methodological quality
The COSMIN criteria (21,23) were used to asses methodological quality. The COSMIN checklist describes how nine different measurement properties should ideally be evaluated and provides scoring criteria for the methodological quality appraisal. The quality of the methods used to evaluate each measurement property is scored on a four-point rating scale: "excellent", "good", "fair", or "poor". An additional box was used to assess requirements for studies that used Item Response Theory (IRT). For interpretability, two aspects were evaluated, i.e. oor and ceiling effects, minimally important change and minimally important difference values. More information on COSMIN and the checklist items can be found on the website: https://www.cosmin.nl/.

Appraisal of the measurement properties
Criteria developed by Terwee et al (22) and Schellingerhout et al (24,25) (see table 3) were used to rate the measurement property of an instrument within a particular study with three possible quality scores: a positive rating (labeled +), an inconclusive rating (labeled?), and a negative rating (labeled -).

Best-evidence synthesis
Some studies evaluated the same measurement properties for a speci c questionnaire. In that event, The results from the different articles were synthesized, as suggested by Terwee et al (22). The quality of a particular measurement property was determined using the method recommended by Schellingerhout and colleagues (24,25). The appraisal of methodological quality of the studies (see 2.4.1), the appraisal of the measurement property (see 2.4.2), the number of studies assessing the property, and the consistency of the results in the case of multiple validation studies were taken into account. For this overall rating, ve levels of evidence were applied (see table 4). Four members of the research team (TP, BP, JD, JYLR) rated the methodological quality and measurement property of each article, with discrepancies discussed until consensus was reached. One team member then performed the best-evidence synthesis (JD) and a second checked it (JYLR).

Studies included
Electronic searches identi ed 2775 articles. Another 236 articles were identi ed by the citation check of all articles that were eligible for inclusion in this systematic review. Title and abstract screening resulted in the exclusion of 2797 records. The remaining 214 full-text articles were retrieved and assessed for eligibility. In total, 37 articles met the inclusion criteria, of which 21 were derived from the primary search and 16 from the citation check. After removing duplicates, 29 articles were included in the synthesis. The main reason for exclusion was that articles did not assess the psychometric properties of the related instrument (97 articles). The 29 articles included described the development and/or evaluation of 29 instruments that assessed patients' experiences in primary care.  Overview of studies Table 5 gives an overview of measurement tools and Table 6 provides an overview of the studies included. In total, 29 studies were included in the review, reporting on 29 different tools. 23 studies reported on the initial development and validation of the tools. The others reported on further development of an existing tool (with a different sample, assessing a different psychometric property) or compared several instruments. Most of the studies were from the UK (N = 12) and USA (N = 7). All studies reported on the validation of tools in English. The number of items in the tools included featured from 6 to 84 items. Of the 29 tools, 19 used a ve-point Likert scale for response categories. Samples size of the studies varied from N = 21 to N = 190038 patients. Table 5 should appear here Table 6 should appear here Constructs captured by the measurement tools included Details of the subscales used to capture patients' experiences of primary care were extracted from the included tools (see Table 5). Eight studies did not report any subscales. Constructs captured by the tools included illustrated a diversity of conceptualizations of patients' experiences of primary care, captured by a wide range of different subscales (See Table 7). There was no clear consensus across the tools included about which aspects of the patient experience in primary care should be measured. Each measurement tool included captured a different conceptualization of patients' experiences of primary care, with approximately 58 constructs. A synthesis of these constructs is shown in Table 7, grouped into the 9 dimensions of patient-centered care as de ned by IOM. No tool captured all of the 9 domains. Figure 2 gives an overview of the percentage of the tools included which takes into account the domains covered. The domains most frequently assessed were "respect for patients'values, preferences, and expressed needs" (59%). The least frequently assessed domains were "physical comfort" and "involvement of family and friends" (3.45%).  Table 7 should appear here Quality of design, methods and reporting Table 8 provides an overview of the assessment of the methodological quality of the studies included using the COSMIN criteria and checklist with 4-point scale ratings. While most studies used the "classical test theory" (CTT), one study used the "item response theory" (IRT). No single study assessed all the measurement properties. Across the all the measurement tools included, only an average of three psychometric properties, of the nine possible psychometric properties available, had actually been assessed. Table 8 should appear here For interpretability, all the studies reported the way in which missing items had been handled. 11 studies reported the percentage of respondents with the highest possible score and the lowest possible score. Neither minimally important change (MIC) nor minimally important difference (MID) were assessed in any study.
For generalizability, most studies reported the sampling method and description. The most common sampling was convenience sampling. Most studies included patients with a wide age range. Gender distribution had been achieved in all the studies. All the studies had been conducted in Western countries.
Overall results on the best evidence synthesis of the instruments included The best available evidence (Table 9) was unknown for 50% or more of the instruments across all their measurement properties. Table 9 should appear here.

Page 6/25
Discussion Heterogeneity of the constructs Many questionnaires have been created to capture patients' experience in primary care. After examination, there appears to be a huge diversity in the de nitions of quality of primary care. This lack of consensus resulted in failure to choose an instrument. Questionnaires identi ed by Lévèque and al (20) captured constructs which analyzed the following: community orientation (equity, community participation), patient-centered care (global care, family-centered care, cultural sensitivity, patient-doctor relationships, respect, communication), attributes of clinical care (technical quality, accessibility, continuity, care management, comprehensibility) and structural dimensions (information management, multi-disciplinarity). This review has found all the constructs. It is, therefore, able to offer a more comprehensive de nition of the quality of primary care and one which is also in line with previous studies (20,55). By classifying the tools according to the domains they capture, designers of a study into the quality of primary care are able to choose the questionnaire best suited to ful ll its research objectives.

Lack of validity
Content validity is a widely explored property (22). No Gold Standard can, by nature, be designed for a tool which assesses quality of care. Some studies use the following additional question in their psychometric assessment: "Are you satis ed with the quality of care you have just received?" as a "gold standard". It is a coherent approach to statistical analysis since it makes it possible to compare a subjective datum, theoretically containing an in nite number of domains, with a test containing a nite number of domains. This question, however, can only be used in statistical analysis, since it is not su ciently descriptive to constitute an evaluative tool (56).
Some properties were not applicable, including cross-cultural validity that aims to analyze the validity of the translation of a tool. Only questionnaires translated into French have their transcultural validity evaluated.

Lack of reliability
To evaluate the reliability of an instrument, it is necessary to measure its internal consistency as well as its measurement error (57). In this review, many authors used internal consistency as the unique indicator of reliability. This is an inadequate assessment. However, a validity analysis of a tool can only be undertaken after an analysis of reliability. A tool can only be valid if it is reliable (58). The lack of scienti c methodology for assessing the reliability of most of the studies included in this review makes interpretation of their validity questionable.

Lack of insight into the ability to measure change and to interpret change
No study evaluated the responsiveness of its questionnaires. Responsiveness measures the ability of an instrument to detect change in the data measured over time. For an investigator who wants to study the effects of an intervention on quality of care, it is highly important to know the responsiveness of the instrument. Finally, it is relevant to know, for a given tool, the Minimally Important Change (MIC) or the Minimally Important Difference (MID).

Strengths and limitations of the review
This review was based on a published methodology and followed all the standards required for a systematic literature review (59). It appears to be the rst to perform an evaluation of psychometric properties analysis, by the COSMIN method, applied to self-report questionnaires, from the perspective of patients, in the eld of the quality of primary care.
First, two raters (or four when necessary) evaluated the eligibility of articles, extracted the data, and performed the quality appraisal for each measurement property. Therefore, the results are robust. Second, to provide an unbiased appraisal of the measurement quality of the instruments included, all the results and the methodological quality of all their validation studies were taken into account. In addition, the rating of methodological quality was based on the widely accepted COSMIN standards. Third, due to the high number of instruments included, an insight is provided into overall trends regarding property measurement evaluations, their quality, and the overall quality of instruments. This insight makes it possible to provide general recommendations on how to improve instruments, and their validation studies, when assessing the quality of patients' experiences in primary care.
The study had some limitations. Given the number of initial search criteria, even where an article had excellent sensitivity, in order to be eligible for inclusion it had to describe a study that aimed to develop, or validate, the patients' experiences by means of a primary care evaluation instrument. Consequently, it was possible to miss relevant articles if development or validation of an instrument was not explicitly mentioned in either its title or its abstract. In addition, with one of the aims of the study being psychometric analysis, all the questionnaires found which lacked a psychometric analysis were excluded from this review. A selection bias is therefore possible for tools without any psychometric analysis. The methodological analysis was performed using the COSMIN criteria.
This methodology has been put in place to minimize information biases. Nevertheless, the possibility of this type of bias remains.
Interest and future implications This is the rst systematic review of the literature on a patient self-assessment tool which is concerned with the quality of primary care and which includes an analysis of its psychometric properties. It enables designers of primary care quality studies to understand the strengths and limitations of existing instruments in terms of captured constructs and psychometric properties. It also reveals, for a given questionnaire, the weaknesses of its psychometric properties and possibly creates the opportunity to design a study to reinforce these properties. Despite a growing interest in evaluating the quality of care and the abundance of instruments validated within the framework of hospital structures, particularly in the context of their accreditation, few psychometric evaluations of instruments have been developed in primary care. Future research will need to develop or validate a generic tool for assessing quality of care in primary care. This instrument should be reliable, valid, reactive and interpretable so that it can be used within a healthcare system and so that it allows comparability, both in space (comparing the quality of care of two healthcare homes to de ne the optimal organization) and over time (comparing the results before and after an intervention to measure the bene t). This review could be the starting point for such work and provide a solid foundation. From this point, a researcher will be able to identify the most suitable questionnaire for his/her work and supplement or reinforce the psychometric analysis. Later work could involve translating it, adapting it to different cultures and nally testing it in each environmental location. This can also be the starting point for the creation of a new and better validated health assessment questionnaire on the quality of primary care.

Conclusions
This systematic review shows that many patient self-assessment tools, which are concerned with the quality of primary care and which include an analysis of their psychometric properties, exist. However, comparison across instruments highlighted a wide variability in terms of captured constructs and psychometric properties. It also reveals, for a given questionnaire, the weaknesses of its psychometric properties. High quality studies are needeed to reinforce these properties or to develop a generic tool for assessing quality of care in primary care. This instrument should be reliable, valid, reactive and interpretable.  Inclusion criteria 1-The article had to investigate a self-report questionnaire, 2-The article had to describe a primary study in which the development or evaluation of one or more instruments occurred,

3-Instruments under investigation:
a. Were developed with the aim of measuring the process of health care delivered to a patient (with or without caregiver) by at least two primary health care providers, b. Were developed or evaluated in terms of their ability to measure patients' experience of primary care. To guarantee a focus on quality of care, these instruments should assess at least three dimensions of quality of care. 4-The article had been peer-reviewed, 5-The article was written in English or French.

Exclusion criteria
To guarantee that the instrument under investigation measured in-patient experience of primary care, the following three exclusion criteria were applied: 1-Articles investigating instruments that measure a health outcome such as quality of life, health status, burden of diseases, handicap … any that did not include the process, 2-Articles about instruments were evaluated in healthcare establishments and not in general practitioner centered settings, such as emergency, medical home care, 3-Articles about instruments evaluated in a restricted sample (aging, specific condition, specific gender…). Internal consistency The degree to which items in a (sub)scale are inter-correlated, thus measuring the same construct.

Reliability
The extent to which subjects can be distinguished from each other, despite measurement errors (relative measurement error).

Measurement error/Agreement
The degree to which the scores on repeated measurements are close to each other (absolute measurement error).

Content validity
The degree to which the instrument is an adequate reflection of the construct to be measured.

Construct validity
Structural validity The degree to which the scores of the instrument are an adequate reflection of the dimensionality of the construct to be measured.

Hypotheses testing
The degree to which the scores of the instrument are consistent with hypotheses, based on the assumption that the instrument validly measures the construct to be measured.
Cross-cultural validity The degree to which the performance of the items on a translated or culturally adapted instrument are an adequate reflection of the performance of the items of the original version of the instrument.

Criterion validity
The degree to which the scores of the instrument are an adequate reflection of a 'gold standard'.

Responsiveness
The ability of the instrument to detect changes over time in the construct measured.

Interpretability
Interpretability is the degree to which one can assign qualitative meaning-that is, clinical or commonly understood connotations-to an instrument's quantitative scores or change in scores. ? Not able to score because of unclear or missing information, e.g. SEM, SDC not calculated, or MIC not defined.
Content validity + Target group and/or experts considered all items to be relevant AND considered the items to be complete.
? Not able to score because of unclear or missing information, e.g. no results on item relevance according to experts' reports.

Construct validity
Structural validity + For exploratory factor analyses: Factors chosen explain at least 50% of variance OR factors chosen explain less than 50% of variance but the choice is justified by the authors. For confirmatory factor analyses: (The goodness-of-fit indicators fulfil the following requirements: (CFI or TLI or GFI or comparable measure >0.90) AND (RMSEA or SRMR < 0.08)) AND (results confirm models with the original factor structure OR results confirm a model with slight changes if these changes are justified by the authors.
? For exploratory factor analyses: Not able to score because of unclear or missing information, e.g. explained variance not mentioned. For confirmatory factor analyses: Not able to score because of unclear or missing information, e.g., no fit indices are presented.
Hypothesis testing + (At least 75% of the results are in accordance with the hypotheses AND, if calculated, the correlation with an instrument measuring the same construct is ≥ 0.50) AND correlations with related constructs are higher than with unrelated constructs if calculated.
? Not able to score because of unclear or missing information, e.g. no correlations with related construct are calculated.
Cross-cultural validity + The original factor structure is confirmed AND no important DIF found. If only one of these properties is investigated: either the factor structure is confirmed OR no important DIF found.
? Not able to score because of unclear or missing information, e.g. no confirmative factor analyses is performed nor is the DIF investigated.
? Not able to score because of unclear or missing information. ? Not able to score because of unclear or missing information, e.g. no correlations of change score with related constructs are calculated or no AUC investigated.
-Change score correlation with an instrument measuring the same construct < 0. 40  Only studies of poor methodological quality A plus sign (+) indicates positive results for a measurement property evaluation and a minus sign (-) indicates negative results for a measurement property evaluation, e.g. + stands for limited evidence of positive results and ---stands for strong evidence of negative results for a measurement property.