Design and participants
This is a development and validation study conducted within primary care practices (general internists) in France and the French-speaking part of Switzerland between 2014 and 2018.
Figure 1 summarizes the development and validation process of the barriers to depression care questionnaire (BDC-Q) with participants’ socio-demographic characteristics.
Step 1 - Item development
We devised 42 items from a literature search conducted in MEDLINE using terms related to depression, family practice, and barriers/facilitators and through bibliographies of retrieved studies. We included qualitative and quantitative studies referring totally or partially to barriers to depression care. We excluded studies that did not involve GPs’ views or were focused exclusively on specific aspects of depression care or populations (e.g. depression in youth, end of life care or postpartum depression). We retained 11 studies from the USA8,11,12, UK 13, France 14, the Netherlands15, Australia16,17, Hong Kong18 and 2 international meta-syntheses.9,19 We also added the results of a focus group conducted within a local depression improvement program in the French-speaking part of Switzerland (unpublished results). Any elements describing an implicit or explicit barrier were extracted, adapted and translated into French to form an item.20
Step 2 - Content validation
Content validation was undertaken with a French-speaking panel of 19 physicians with expertise in the management of depression in PC. Physicians were GPs, psychiatrists and academics from Lyon (France), Strasbourg (France) and Geneva (Switzerland). Using the content validation index described by Polit, experts rated the relevance and the clarity of each item on a 4-point scale (not relevant, quite relevant, relevant, highly relevant; respectively : not clear, quite clear, clear, very clear).21 An item was considered relevant if more than 75% of experts selected it as “quite relevant” or “highly relevant”. An item was considered clear if it obtained 80% or more of “quite clear” and “clear” ratings.21 Among the 42 items submitted to the binational expert panellists, 16 irrelevant items were dropped. 26 items were retained for pretesting with relevant and unclear items (n = 5) modified according to expert comments or by returning to the literature findings.
Step 3 - Pretesting
Pretesting of the questionnaire was undertaken to provide response process validity evidence defined as the fit between the construct and the detailed nature of the response engaged in by test takers.22,23 We used individual semi-directive cognitive interviewing techniques24 among 20 GPs who were recruited using snowball sampling for maximal variation.25 None of the GPs in this step were involved in the content validation step. Two trained investigators (LL and AG) performed the interviews in GPs’ practices. Participants read and laudably answered each item before answering standardised question probes.26 The probes explored item understanding (e.g. <<In the item “mental health care professionals are available to take on new patients”, what does the word ‘available’ mean to you?>>),redundancy between items, as well as the response selection process27 (e.g. “How did you decide your answer to this question was strongly agree ?”). All the cognitive interviews were recorded, transcribed verbatim and independently coded by LL and AG using a 12-point coding sheet.26 For each interview, items were separately coded as “adequate” or alternatively with a combination of 11 problematic codes (e.g. “Respondent unsure how to answer since experience varies depending on circumstances”; “Respondent asks for clarification of the item”). An item was judged satisfactory if it was coded “adequate” for more than 85% of respondents.26
Among the 26 tested items, 13 were adequate and retained without modification. The 13 problematic items were revised through discussion between the research team members, in accordance with propositions highlighted during the cognitive interviews. In this process, 2 derived items were added leading to a 28 items questionnaire.
For all items, a traditional 5-point Likert scale was suitable, ranging from strongly disagree to strongly agree.20 The neutral position was labelled “no answer” to capture a neutral position or a non-existent opinion. In order to obtain a balanced questionnaire, half of the items were positively worded (e.g. “It is easy to distinguish between simple sadness and a depressive disorder”) and the other half negatively (e.g. “Obtaining feedback on patients from mental health care professionals is difficult”).
Steps 4 and 5 - Testing phase and Test-Retest reliability
The testing phase of the BDC-Q was carried out among GPs from the Alsace and Rhone-Alpes regions of France. Participants were recruited through regional professional organisation mailing lists. Surveys were web-based and gathered socio-demographic characteristics. The questionnaire items were displayed randomly to avoid response contamination bias.28 We prevented missing data by forcing response to all items. 131 GPs initiated the survey (response rate of 13,3%) with a completion rate of 88,5% (15 incomplete surveys were excluded). Thus, 116 surveys were used for the analyses in the testing phase. We used principal component analysis followed by internal consistency testing to organise the items into descriptive dimensions. Test retest reliability was conducted with a sub-sample of 40 GPs. They were asked to respond to the survey again after a 14 days interval.
Statistical analysis
We calculated descriptive statistics for each item including the mean, standard deviation and range to inspect floor and ceiling effects. Items endorsed by more than 95% of the participants were considered for removal.29 We performed a principal component analysis (PCA) with Promax rotation to aggregate BDC-Q items in factors after using the Kaiser-Meyer-Olkin (KMO) index of sampling adequacy to confirm suitability of the data. We used combined criteria (i.e. eigenvalue > 1 and interpretability) to retain the most relevant factors and order them into consistent dimensions.30 A minimum factor loading of 0.40 was used as the criterion for each retained item.31 An item obtaining primary and secondary loading superior to 0.40 was assigned to the dimension with the most theoretical sense. We used classical Cronbach’s Alpha coefficient to assess dimensions internal consistency. In doing this our intention was not to create subscales, but just to check that the items were sufficiently related, justifying their grouping under the different dimensions. We chose the following critical values for the dimensions internal consistency : α > 0.75 = excellent, α between 0.60 and 0.75 = good, α between 0.40 and < 0.60 = moderate, and α < 0.40 = poor.32 We used Pearson correlation coefficient to determine dimensions test-retest reliability between time 1 (T1) and time 2 (T2) with the following critical values for Pearson’s r : r > 0.5 = high, r > 0.3 < 0.5 = moderate, and r < 0.3 = low.33 All statistical analyses were conducted using IBM SPSS Statistics version 24.