Desirable features in a decision aid for prenatal screening – what do pregnant women and their partners think? A mixed methods pilot study

To help pregnant women and their partners make informed value-congruent decisions about Down syndrome prenatal screening, our team developed two successive versions of a decision aid (DAv2017 and DAv2014). We aimed to assess pregnant women and their partners’ perceptions of the usefulness of the two DAs for preparing for decision making, their relative acceptability and their most desirable features. This is a mixed methods pilot study. We recruited participants of study (women and their partners) when consulting for prenatal care in three clinical sites in Quebec City. To be eligible, women had to: (a) be at least 18 years old; (b) be more than 16 weeks pregnant; or having given birth in the previous year and (c) be able to speak and write in French or English. Both women and partners were invited to give their informed consent. We collected quantitative data on the usefulness of the DAs for preparing for decision making and their relative acceptability. We developed an interview grid based on the Technology Acceptance Model and Acceptability questionnaire to explore their perceptions of the most desirable features. We performed descriptive statistics and deductive analysis. We assessed pregnant women and their partners’ perceptions of the usefulness two DAs for prenatal screening for preparing for decision making, its relative acceptability and its most desirable features. Our results reect the paradox inherent in the design of all DAs, i.e. the challenge of designing DAs that achieve the correct balance between simplicity and enough information. A new, user-centered version of the prenatal screening DA will integrate participants’ suggestions to reect end users’ priorities. The new version will include all decision points, information presented in a more balanced and accessible manner, a simpler values clarication exercise, and will include partners as decision-makers. of Risk-benet


Introduction
Study design This is a convergent parallel mixed methods pilot study (29). We collected and analyzed quantitative and qualitative data separately before merging them. This study was approved by the Ethics Committee of Centre intégré universitaire de santé et de services sociaux de la Capitale-Nationale (Project 2017-2018-15 MP).

Settings
We conducted the study in three clinical prenatal care sites in Quebec City: (i) the Maison de naissances de la Capitale Nationale (a birthing centre); (ii) the Maizerets Family Medicine Group (FMG); (iii) the Obstetrics and Gynecology Department of Saint François d'Assise Hospital. The objective was to recruit pregnant women in different socio-demographic milieus and who were being followed by different kinds of healthcare professionals: midwives (birthing centre), family physicians (family medicine group) and obstetricians (hospital).

Participants and recruitment
Study participants were pregnant women with or without their partners and women who had recently given birth. To be eligible, women had to: (a) be at least 18 years old; (b) be more than 16 weeks pregnant (we did not want to in uence the outcome of their decision, and in Quebec, at 16 weeks women will have already taken the public screening test or refused it); or having given birth in the previous year; (c) be able to speak and write in French or English; (d) be able to give informed consent. We excluded women who: (i) participated in certain phases of our earlier studies on DAs (30,31) and/or (ii) presented a high-risk pregnancy (e.g. preeclampsia, gestational diabetes, multiple pregnancy).
We approached managers of the three clinical prenatal care sites in Quebec City and obtained their consent for recruitment. We conducted the recruitment from May 2018 to February 2019. We approached participants in the waiting rooms of clinical sites where a research assistant informed them about the project and evaluated their interest in participating in the study. The research assistant gave consent forms to participants who volunteered to participate in the study and noted their contact details on the recruitment form in order to follow up with them and to nd a date for the meeting. Participants were invited by phone or email to a meeting of about 60 to 90 minutes either at our research center, in a room at the clinical site, or in a place of their choice (e.g. home) depending on their preference.

Data collection
We collected data focusing on two prenatal screening decision aids whose features we detail below. Table 1 presents the differences between the DAs. The DAv2014 is a four-page DA that presents two options: to do the test or not to do it. Speci cally, it includes (i) evidence-based information on Down syndrome (DS) and the risk of having a baby with DS given the mother's age; (ii) evidence-based information on the advantages and disadvantages of doing the test or not; (iii) evidence-based representations of the probabilities of false positive/negatives for the tests offered in the Quebec public healthcare system; (iv) a work chart to help women re ect on and write down the advantages and disadvantages of doing the test or not in terms of what is most important to them; (v) a box to note the decision taken and (vi) the SURE test (Sure of myself -Understand information -Risk-bene t ratio -Encouragement/support) for evaluating the person's certainty about the decision made (32) (

Procedure
We administered questionnaires to individual women and couples, to be self-completed, and conducted individual interviews with women or dyadic interviews with women and their partners. We pre-tested the questionnaires and the interview guide with two participants (pregnant women) and improved them following their feedback. At the beginning of the meeting, the research assistant welcomed participants and presented the context and purpose of the study. Then she collected signed consent forms and ensured that participants had no questions about the study. After making sure that the participants were ready to start the study, the research assistant began data collection. In step 1, she distributed questionnaire 1 (items on socio-demographic information). Only one questionnaire was given to couples, who were invited to complete it together. In step 2, participants watched the beginning of a video demonstrating the use of a DA by a clinician and a couple (about 2 minutes). In step 3, they received and read DAv2017. The women and couples were asked to imagine making a prenatal testing decision with the help of the DA. In step 4, participants received questionnaire 2 (items on acceptability and usefulness of the DA to prepare for decision-making, in relation to DAv2017) and completed it alone or with their partner. In step 5, they received and read DAv2014. In step 6, participants lled in questionnaire 3 (items on usefulness to prepare for decision-making, acceptability, and relative acceptability of the two DAs). The last step consisted of interviews, using an interview guide, on users' opinions on the various features of the DAs. The encounters lasted about 60 to 90 minutes.

Questionnaire material
We quantitatively assessed pregnant women's and their partners' perceptions of the usefulness of the two DAs for preparing for decision making and the relative acceptability of the features of the two DAs (e.g. graphic presentation, amount of information).
Usefulness was assessed using the Preparation for Decision Making scale (PrepDM) (33). This is a 10-item scale evaluating how useful the DA was for preparing participants to communicate about the decision with their practitioner in a consultation, for example, "Did this educational material help you recognize that a decision needs to be made?" and "Did this educational material help you think about which pros and cons are most important?" This scale has good psychometric properties with Cronbach's α ranging from 0.92 to 0.96 (33).
Acceptability was assessed using the 10-item Acceptability questionnaire (34). This questionnaire includes both structured and semi-structured questions evaluating the comprehensibility of components, length, amount of information, su ciency of information, balance in presentation of information about options, and overall suitability of the DA for decision making. It can be used for both patients and practitioners (34).
After participants had read both DAs, we asked them which version of the DAs they found more acceptable, and how much they preferred one version to the other as measured on a 5-point Likert scale with the answer choices: Very little, Little, Moderately, Enough and A lot.

Interview material
We qualitatively explored (in more depth) the features of each DA that participants found most desirable and to solicit their suggestions. The development of the interview guide was inspired by the Technology Acceptance Model (TAM) (35) and the Acceptability questionnaire (34). According to the TAM, perceived utility and perceived ease of use are determinants of current usage (35). Perceived utility is in uenced by a number of factors such as the perceived complexity of the information. Perceived ease of use is in uenced by factors such as ease of navigation. Both constructs are also in uenced by factors such as the characteristics of the technology (e.g. perceived attractiveness) and personal characteristics of the user (e.g. perceived pleasure) (35,36). Thus, participants were invited to comment on visuals, colors, ease of navigation, aspects they liked or did not like, most helpful page, and suggestions for improvement of the DA.
Interviews were conducted by TTA, assisted by SAR, MC, MPD, APH or AAT who took notes during each interview. All interviews were audio recorded with the consent of the participants. At the end of the interview, participants received nancial compensation of CAN$25 for their participation in the study.

Sample size
The variable of interest was perceived usefulness of the DAs. This was assessed using the Preparation for Decision Making scale, a ve-point Likert-type scale. Considering that this is a pilot study, to limit selection bias we needed 37 pregnant women with or without their partners for evaluating the usefulness of the DAs.
This number was based on the 10% sample size planned for a future province-wide study that aims to produce a DA that will be routinely used in the general population (37). This calculation is based on the determination of power and sample size for linear models (38). We estimated that a sample size of 343 pregnant women with or without their partners would be su cient to detect a partial correlation of 0.15 between the usefulness of the DA in preparing for decision making and the potentially confounding variables to be included in the model. We used a power of 80% and a statistical signi cance level of 5%. To account for missing data and to ensure that the sample was large enough to perform subgroup analyses (e.g., by age), we added 10% of its value to the estimated size. Therefore, the total sample for the larger study will be 377 pregnant women and their partners.

Data analysis
For the quantitative data, we used basic descriptive statistics (means, standard deviations, percentages, and 95% con dence intervals) to describe the sample in terms of socio-demographic characteristics and for all quantitative variables. For Usefulness (Preparation for Decision Making scale), items were summed and scored (divided by the number of items and multiplied by 25). Thus, scores were converted to a 0-100 scale with higher scores indicating higher perceived usefulness for preparing for decision making (33). To assess which DA was more useful overall for preparing for SDM, we performed a Student's t-test. We calculated descriptive statistics of the six closed-ended Acceptability scale questions, while the open-ended Acceptability questions were analyzed together with the qualitative data. The comparison questions about the DAs were analyzed descriptively as well. All quantitative analyses were performed using SAS 9.4. No processing of the missing data was done.
For the qualitative part, all the audio recordings were transcribed verbatim. Two team members performed a deductive thematic analysis based on questions inspired by the TAM and the Acceptability questionnaire (34,35). For dyadic interviews (women and partners), the point of view of each member of the couple interviewed was included in the analysis. We proceeded to a within-dyad and across-dyad analysis by identifying points of agreement and disagreement within dyads. We combined patterns and ideas across dyads (39). We used the software Nvivo 12 to analyze the qualitative data. Data analysts separately familiarized themselves with the content of the transcripts by reading about half of them. We chose two interviews at random and separately coded them. We created a theory-driven codebook based on our preliminary coding exercise (40). We separately coded all the transcripts while reviewing and revising the codebook in light of the data (40). We created a detailed summary of all themes from the data, specifying the number of statements and the overall trend of opinions on each theme. In two working sessions of three hours each, we cross-checked ndings, analyzed the interrelation between themes (40) and reached consensus on discrepancies while going back to the verbatim or Nvivo nodes as needed. We reported on the most important ndings. Based on Chang et al. (2009) (33), we reported qualitative data using the graded quanti er words few, some, many and most. (41). Based on Chang et al. (2009) (42), we used "few" when when three to nine participants commented on a theme, "some" when 10 to 17 commented, "many" when 18 to 25 commented, and "most" when 26 to 39 participants commented.

Results
Participants' characteristics  Table 2 shows socio-demographic characteristics of participants and characteristics related to their decision-making. Brie y, the majority of participants were between 25 and 34 years of age (79% of women and 59% of partners), Caucasian (90%), Canadian (90%), married or in a common-law relationship (85%), highly educated (66.7% of women and 54% of partners had a university-level education) and with a relatively high socioeconomic status (46% had an annual family income of CAN$60,000 to $99,999).

Acceptability
Overall, participants rated DAv2017 as more acceptable than DA2014 in every acceptability category (e.g. amount of information, balance of information). Brie y, 80% of participants rated the presentation of DAv2017 as "excellent/good" against 67% for DAv2014. The amount of information was found "just right" by 80% of participants for DAv2017 against 56% for DAv2014. The worksheet was rated "excellent /good" by 62% of participants for DAv2017 against 51% for DAv2014. Regarding balance, 90% of participants rated DAv2017 "balanced" against 82% for DAv2014. Also, 10% of participants thought DAv2017 was unfairly slanted towards choosing to do the test, and another 10% thought that DAv2014 was unfairly slanted towards choosing not to do the test. For Usefulness, DAv2017 was rated "very useful/useful" by 92% of participants against 74% for DAv2014. Su cient information, 87% of participants said "yes" for DAv2017 against 67% for DAv2014 (Table 3). Qualitative results Below we summarize participants' comments along with stated or implied suggestions. We organized them into categories based on combined aspects of the TAM (35,36) and the Acceptability questionnaire (34), namely presentation (graphics, colours, fonts and format), information (amount, content and comprehensibility), values clari cation, balance, and navigation. In Table 4, we present quotations illustrating these themes. to indicate the pros and cons to weigh up, rather than a verbal explanation). Many of them preferred the more sober colors used in DAv2014. Many also liked fewer colors and a paler background, which made the printed information clearer. Many participants preferred the font size of DAv2014 because DAv2017 had more pages of relatively small and non-uniform font (Table 4, Fig. 2).
Participants suggested the use of more graphics and symbols because they are helpful for understanding the message. They suggested not using pale print colours, or deep colours as background (di cult to read) and glossy paper. They suggested larger fonts i.e. at least 12-point. Even though some participants liked the booklet format presented, they suggested that to be more convenient, the DA could be in 1) a pamphlet format, 2) a smaller format that matches an existing pregnancy kit distributed to all pregnant women, or 3) a digital version (Table 4, Fig. 2 Fig. 2).

Amount of information
Some participants wanted as much information as possible and so found the amount of information in DAv2017 was "just right", while others thought that too much information was intimidating "for women who don't like to read" and that less information would be better. Paradoxically, it was sometimes the same people who wanted more information and yet a simpler DA (Table 4, Fig. 2).

Comprehensibility of information
A few participants found that there were too many acronyms in DAv2017 and that it was easier to navigate in DAv2014. A few participants found the presentation of statistics as frequencies was confusing in both DAs (Table 4, Fig. 2).
To improve the DAs in terms of the information provided, participants suggested adding information about resources for women who decide to go ahead with a Down syndrome pregnancy: how to prepare, resources for children with Down syndrome and their parents, other parents' experiences (e.g. cost, management of children with Down syndrome). They also suggested replacing frequencies with percentages to communicate probabilities, and using fewer acronyms. They suggested inviting the partner to be more involved to make sure both of them are comfortable with the decision taken.

Worksheet -Values Clari cation Exercise
Many participants did not feel that the values clari cation exercise in DAv2017, designed using the multiple criteria decision-making structure (circling the stars) to evaluate the advantages and disadvantages of each test and the decision-making factors, was useful. Some of them preferred the pros and cons list in DAv2014 and being able to simply write down the factors they felt were most important. They speci ed that the list should be close to the values clari cation information (on the same page). A few participants felt that no values clari cation exercise at all was necessary. The information alone was enough to help them make up their mind (Table 4, Fig. 2).
Several alternative solutions to the star-circling exercise were suggested by the participants: 1) using a scale from 0 to 5 with a legend at the bottom, 2) adding lines for taking notes, 3) choice of YES/NO, 4) inviting readers to rank the factors in order of importance before circling the stars, 5) using checkboxes.

Balance
A few participants felt that DAv2017 was weighted towards the decision to take the test because there was little space given to the "not to do the test" option, while the decision to "do the test" included three options, with information about each taking up much more space. One participant felt DAv2014 was oriented towards the decision not to take the test because of the choice of words used. One participant pointed out that using a green background for some options and red for others could be interpreted as "go" and "stop", i.e. subliminally urging the user to choose one over the other (Table 4, Fig. 2). Table 4

Data triangulation
The interviews gave us a better understanding of the participants' quantitative assessment of the DAs and showed some interesting inconsistencies with the quantitative data, namely regarding presentation of information and values clari cation.
From a quantitative point of view, the choice of participants seemed to be in favor of DAv2017 for all acceptability dimensions, while during the interviews, participants clearly stated that they preferred the presentation of DAv2014. In fact, a detailed analysis of the dimensions of acceptability showed that this inconsistency was also present in the quantitative data: although globally 80% of participants rated the presentation of DAv2017 as "excellent/good" against 67% for DAv2014, more participants (36%) rated the presentation of DAv2014 as "excellent" than DAv2017 (23%) ( Table 3).
In addition, participants judged the worksheet of the DAv2017 to be more acceptable than that of the DAv2014 for weighing up pros and cons, and yet in the interviews clearly indicated that the star circling in DAv2017 was not a helpful values clari cation exercise. These discrepancies show the importance of mixed methods in user evaluations of decision making tools. The addition of qualitative data provided nuances and correctives to qualitative data and deeper insights into users' perceptions.

Discussion
In this mixed-methods pilot study, we assessed pregnant women and their partners' perceptions of the usefulness of the two DAs for prenatal screening for preparing for decision making, their relative acceptability and their most desirable features. Globally DAv2017 had better scores of usefulness for preparing for decision making and of acceptability than DAv2014. The preferred version of most participants was DAv2017, while they preferred the presentation and the values clari cation exercise in DAv2014. Neither DA presented information in a completely balanced manner. These results lead us to make the following observations on the most desirable features in a prenatal screening DA.
First, participants preferred the presentation of DAv2014, suggesting that appealing visuals (esthetics, colors, fonts) favoured acceptability and usability of a DA. This is consistent with other literature relating to evaluation of DAs, where authors report suggested changes to be made to their DAs in terms of font size (43), background colours (43) and colors in general (43)(44)(45)(46). According to these studies, participants dislike colors that are " at", "uninteresting" or "somber". We learned from our study that a font size of at least 12 would be ideal. However, choice of colors remains di cult, because "there's no accounting for taste" and there are no visual standards or guidelines for designing DAs. This highlights the importance of involving potential users in designing the DAs. Researchers could also consult outside the eld, such as in the advertising industry, which has extensive experience in presenting material in a visually attractive way.
Second, most participants found that the additional information (on the different tests) in DAv2017 was important. This suggests that the more complete a DA is, the more useful it is for preparing people to make a decision. A previous study has shown that the amount of information can be both a positive and a negative factor (30) depending on user preferences and levels of literacy and numeracy. Indeed, though participants in our study were highly educated, a few participants still felt that DAv2017 was confusing because of the amount of information. To reach the whole population of pregnant women and their partners in Quebec, where over a quarter of the population have literacy problems that affect the management of their health (47), the DA will need to take this critique seriously. In a 2013 review of 97 DA trials, only three DAs overtly addressed the needs of lower health literacy users (48). This is perennial challenge for DA developers, as participants want as much information as possible in as simple a format as possible. Hence the importance of presenting as much information as possible using graphics, as suggested by study participants, and developing digital versions, in order to add links and hyperlinks toward additional resources (49) or using the guiding principles of the edutainment model for developing lowliteracy patient decision aids (50).
Third, of the two values clari cation exercise, participants preferred that of DAv2014 consisting in writing their personal pros and cons of doing the test or not in a summary table. They found that the star circling values clari cation system, based loosely on the multiple criteria decision-making model, in DAv2017 was useless or too complicated. As we did not provide a legend for interpreting the star circling (51) evaluations (assessment of the performance of alternatives on the criteria), weights (assessment of the relative importance of the criteria), but also an aggregation method (algorithm for synthesis of the above information) (52). Study participants preferred to have other types of values clari cation methods or none at all. Indeed, values clari cation methods tend to give mixed results: some studies show they improve the decision-making processes (53,54) and others that they have no effect on them (55). However, some authors argue that many people need help in clarifying their values (55, 56) even while using their intuition parallel to this analytical process (57

Declarations
Ethics approval and consent to participate This study was approved by the Ethics Committee of Centre intégré universitaire de santé et de services sociaux de la Capitale-Nationale (Project 2017-2018-15 MP). The project was described to eligible themselves to be just as important in the decision making as the women (62). For couples who cannot agree on a decision, the DA could facilitate their arriving at a shared decision. Indeed, in our study some couples disagreed over which elements of the DA were most important. The idea of providing genderspeci c information (58) should also be explored to promote better understanding and help each partner consider his or her speci c role in the decision and its consequences.

Limitations
The strengths of this study should be considered in the context of its limitations. First, participants were highly educated, which does not represent the overall population of pregnant women in Quebec City. In a future investigation we will seek a more heterogeneous group of participants to identify usability issues from a diverse range of perspectives. Second, all participants received DAv2017 before DAv2014. This order may have in uenced their opinions. Third, we had a considerable refusal (44%) and abandonment rate (60%) related to encounters being estimated at 60 to 90 minutes long. However, we mitigated this limitation by being exible about the place and time of the meeting. Finally, we met the participants after they had already made a decision to do the screening test or not. Participants had to imagine they were still in the situation of making the decision to answer the questions. This may have biased their answers. However, the time between their decisions and the study was relatively short.
participants and they were told that the data was anonymous and con dential. Those who wished to participate gave written consent.

Data Availability Statement
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Con ict of Interest Statement
The authors declare that they have no con ict of interest.  Desirable features v2020 04 28

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. PtDABCM201506English.pdf PtDA.PEGASUS2017NIPTTier1English.pdf