Development of a Discrete Choice Experiment (DCE) questionnaire to elicit values by pregnant women and decision-makers for the expansion of a NIPS-based prenatal screening program

In an accountable world, being able to take into account the value given by relevant stakeholders to an intervention that could be offered to the population is considered as desirable. DCE is an approach particularly suited for the measurement of such values in the eld of prenatal care. Yet, DCE studies in the eld of prenatal screening have focused mainly on pregnant women and their care providers but have neglected another key actor, the decision-makers. The objective of the study was to develop a DCE instrument applicable to pregnant women and decision-makers, for the evaluation of new conditions to be added to a screening program for fetal chromosomal anomalies. Methods An instrument development study was Methods employed included a literature review, a qualitative study performed on pregnant women and and a pilot project to developed instrument and test the feasibility of its administration through an online survey platform. and focus HEE: Health Economic Evaluations; CBA: Cost-Benet Analyses; cfDNA: cell-free Fetal DNA; CHUL: Centre Cost-Utility Analyses; DCE: Discrete Choice Experiment; HTA: Health Technology Assessment; INESSS: Institut National d'Excellence en et Sociaux; ISPOR: International Society for Pharmacoeconomics and Outcomes Research; NIPS: Non-Invasive Prenatal Screening; QALYs: Quality-Adjusted Life-years; WTP: Willingness-To-Pay


Introduction
Allowing stakeholders such as the population to in uence, directly or indirectly, policy decisions is a fundamental component of democratic jurisdictions (1). In the health care sector, this in uence is generally referred to by the terms 'participation', 'involvement' or sometimes 'engagement' (2). They cover the multiple modalities that allow different stakeholder groups to bring their input into the decisionmaking process on what services to provide to the population (3).

Page 3/29
Health economic evaluations (HEE) are a methodological approach that allows putting a judgement on the economic value of a health intervention (4). A lot of efforts have been deployed to allow some participation of the public in the HEE process (5)(6)(7). This has led to two approaches that include inputs from either patients or the general population, in the estimation of a bene t of having access to a health intervention: cost-bene t analyses (CBA) (via the use of willingness-to-pay and willingness-to-accept analyses) and cost-utility analyses (CUA) (8). Willingness-to-pay (WTP) analyses estimate health bene ts as a monetary value whose importance re ects the value given by the respondents to a survey, for the evaluated intervention. Validity and ethical issues make this approach di cult to apply (9). It is therefore seldom used. The second approach, named cost-utility analyses (CUA), uses an indicator, qualityadjusted life-years (QALYs), to express the bene t of an intervention. QALYs quantify health outcome provided by an intervention in terms of gain in length of life adjusted by its patients or general populationde ned utility score or desirability (10,11).
A key issue faced by these approaches is related to the fact that different groups of the population might value an intervention or the health produced by the intervention differently (12) (13). Although CBA and CUA could take into consideration inputs of different groups of the population, they commonly focus on only one group, usually the bene ciaries of the intervention under evaluation (13,14). Researchers who use these approaches seldom consider other groups, like experts who are members of Health Technology Assessment (HTA) agencies committees and who make nal recommendations to policymakers about the adoption of a technology. These experts can value the technology differently compared to the population, based on considerations, which are also socially relevant. This observation mainly comes from the fact that the WTP and utility instruments are built to express the value given to an intervention by its potential or effective bene ciaries (8).
The issue of participation has become even more complicated in the eld of prenatal screening for fetal conditions by the fact that utilities cannot be estimated by the fetus (15). Mother's utilities can be used instead, but this would contravene with the habit to consider utilities of the main bene ciary of an intervention.
Thus, seeking for an alternative modality facilitating this participation becomes more desirable in the eld of evaluation of prenatal screening programs, as major advances in prenatal screening and prenatal testing, an interest to have methodological approaches that could overcome the limitation of CUA.
A discrete choice experiment (DCE) is a method used to explore the relative importance of different attributes within a decision-making process (19). DCE allows measuring a perinatal intervention that concerns a fetus or a newborn from the adult perspective, solving the problem of the impossibility to de ne utility values for a fetus (20,21). A DCE instrument could also re ect the desirability of a perinatal intervention from the perspective of the population/patients and of other groups, as decision-makers or those who work in HTA agencies. It is on the last aspect particularly that this work is focused on.
This paper reports the development of a DCE instrument to measure the preferences of pregnant women and decision-makers for the expansion of the NIPS test for the detection of fetal chromosomal anomalies in a public prenatal screening program.

Methodology
The development of the DCE questionnaire was undertaken in sequential steps ( Fig.1) based on best practice guidance (19,(22)(23)(24). The methods and approach for analysis for each stage are summarized in the following sections.

Attribute identi cation
The process of attribute identi cation started with a review of literature followed by in-depth interviews with pregnant women and decision-makers.
A systematic review of literature conducted by two researchers (HMN, BS) on the use of DCE in the eld of prenatal screening for chromosomal anomalies was performed. The search strategy was validated by a librarian. Data extracted on the identi cation of attributes and levels for preferences of prenatal screening are presented in additional le 1 (full-unpublished report is available on request). The review led to identifying potential attributes that have been found to in uence preferences to undertake a screening test for a new condition in a NIPS-based prenatal screening program.
A qualitative study was then undertaken to test the attributes suggested from the literature and identify others that would be regarded by both groups, the group of pregnant women and the group of decisionmakers, as important to consider in the decision-making process regarding the content of a public prenatal screening program.

Page 5/29
Semi-structured interviews, based on an interview guide (see additional le 2), were conducted with pregnant women (n=12) and decision-makers (n=4). This guide had been beforehand pretested with two pregnant women and one decision maker. These three persons were not included into the sample for this project. They were only asked to comment on the understanding of the interview guide.
Inclusion criteria for pregnant women were to be primigravida, aged 18 years or above, and being consulting at the obstetric department of the CHUL hospital in Quebec City (Canada) for a rst prenatal echography. Inclusion criteria for decision-makers were to be either member of a permanent scienti c committees of the Quebec's HTA Agency, INESSS (Institut National d'Excellence en Santé et Services Sociaux), or a public servant at the Ministry of Health and Social Service (Québec), and to be involved in decision-making processes on services to be provided to mothers and children.
All interviews were digitally recorded. The verbatim were transcribed by using NVivo Transcription (QRS International 2020). The transcriptions were independently checked by two researchers (HMN, CL) while relistening to the recordings.
The analysis was independently performed by these two researchers (HMN, CL). It aimed at identifying key attributes and their levels. The initial framework used to guide the identi cation of the attributes was based on the interview guide (deductive approach) (25,26). Additional codes were generated where required (inductive approach) (25,26).
The analysis started with ve pre-established dimensions that were expected to come out from the interviews: 1) monthly family expenses associated with a child's disability; 2) prevalence of a new disease added to the list of diseases searched for by NIPS; 3) performance of the test for this new disease, i.e., probability of identifying a child with a disability; 4) probability that a child tested positive has a severe phenotype; 5) out-of-pocket cost associated with being tested. New dimensions suggested by the interviews were added. Interviews were built in such a way that, at the beginning of the interview, respondents were incited to say whatever they thought was an important characteristic to consider when deciding which condition to screen for, without being interrupted or in uenced.
The analysis was rst based on triangulation (i.e., categories and themes have to be derived from several sources of information) (27). Moreover, all interviews were coded independently by the two researchers. In case of disagreement, discussions were held until a consensus on the nal coding was reached.
Attributes identi ed by the two groups of respondents were merged. All possible levels arising from the interview were retained. Attributes de ned by only one group were also included in the list of attributes/levels for the following selection procedure.
Attribute selection and framing Attribute selection followed an iterative process aiming at nding a consensus between both groups of participants, on a set of attributes to be included in the DCE questionnaire. The iterative approach was Page 6/29 based on consultations and a focus group discussion. Discussions were held between research team members after each step to re ne the list of attributes/levels.
Firstly, the consultation process was undertaken with the same participants who had participated in the attribute identi cation step and had accepted to be contacted for future study. A list of potential attributes and levels retrieved from the previous codi cation procedure was presented to the participants.
This consultation process aimed to re ne the rst list of potential attributes and levels. This list was sent to each participant via e-mail, to explore their opinion regarding the dimensions' meaning and relevance. Participants were also asked to provide justi cation in case they considered one of the attributes to be irrelevant. Attributes that were considered relevant by the participants were thus retained and modi ed if needed, while those considered irrelevant were excluded.
At the second step, the list of retained attributes was re ned based on a focus group discussion conducted with three pregnant women and one decision-maker solicited from the same hospital (focus group discussion with decision-makers alone were not held due to the limited pool of potential participants). They were asked to give their opinion on the relevance of attributes and levels.
Information gained from the attribute-selection process was synthesized by one researcher (MHN). The content of attributes and levels was then revised by the research team members to ensure their relevance and the comprehension of the wording.
Eight key attributes and their levels were identi ed.
Stage 2: Experimental design and construction of tasks DCE design and construction of tasks followed the 10-points checklist best practice guidance for conjoint experiment design proposed by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) (22).

Construction of choice tasks
An experimental design was performed in order to generate unforced choice questionnaires (28). The questionnaires consist of a series of two options that differ by their level at each of the questionnaire's dimensions (i.e., attributes). The attributes represent characteristics of a hypothetical additional test that could be added to the list of tests already included in a public prenatal screening program. Respondents must choose the preferred option or declare that they cannot decide which of the two options is the preferred one.

Experiment design
The number of possible combinations (so-called alternatives) is determined by the number of attributes and levels. This experiment setup involved eight attributes ( ve attributes with two levels; three attributes with three levels), for a total of 2 5 x 3 3 = 864 screening combinations in the full factorial design (29).

Page 7/29
Since this number was too high, we built a fractional factorial design using e cient design (instead of orthogonal design due to its limitations of interaction effect estimations), to reduce the sample of scenarios to a manageable level, while still being able to evaluate the effect of each attribute and their 2way interactions (28,30). The design allowed de ning the scenario based on orthogonality (no option dominates another), level balance (all levels of each attribute have an equal frequency), and minimum overlap (there is no overlap in attribute levels) (22,28,30).

Design of the questionnaire
Generic labels were used to identify the options, called test A and test B across all the choice sets, because they do not re ect exact screening options and there was a possibility that labels might encourage the use of a heuristic approach (31). An opt-out option (no preference for option A or B) was added although there is risk that an opt-out option could lead to high levels of non-response where a trade-off is judged to be di cult (32). Test A and B were constructed in such a way that trade-offs were expected.
Questionnaires are built in such a way that pregnant women are asked which prenatal test they would prefer to have, whereas decision-makers are asked which prenatal test they would prefer to offer to pregnant women.

Stage 3: DCE survey design
Survey components include an introduction to the survey and explanation on its purpose, consent form, an explanation of attributes and levels, an example of DCE choice task followed by the eight/seven DCE choice tasks (for pregnant women and decision-makers, respectively), and questions on demographic data (age, sex (for decision-makers only), income) and on experience of having or knowing a child with disability.

Stage 4: Pilot testing the DCE Survey
A pilot project was undertaken to explore the feasibility of the survey that will be administered to a large sample of respondents (i.e., explore the understanding of the tasks, the complexity of the choices, the time needed to ll the questionnaire), as well as the statistical relevance of the dimensions and levels of the rst version of the nal questionnaire. For practical reasons (a limited pool of potential respondents in the group of decision-makers), we could only test the questionnaires through this pilot project with pregnant women.
The pilot project was also used to elicit a dominant choice that will be added in the full-scale study with pregnant women and decision-makers.

Page 8/29
Sampling and recruitment A D-optimal design (33)(34)(35) was constructed by judiciously selecting 22 screening combinations, and thus conducted to a pairwise-DCE pilot project (2 by 2 scenarios per choice task) of C(22,2)=231 pairs. This project was undertaken using 33 individuals who were asked to give their opinion on 7 different choice tasks (33*7=231). Detail on sample's calculation of this pilot project can be found in the additional le 3.
Participants had to be at a gestation age between 28 and 30 weeks. They were recruited among participants to a clinical trial (PEGASUS-2, ClinicalTrials.gov Identi er: NCT03831256) who had accepted to be solicited to participate in an additional study. In this stage, participants were asked to choose the screening option that either suits them the best in each of the tasks or state that they could not express a preference.

Data collection and analysis
Choices in the DCE choice task were collected automatically by the LimeSurvey platform. Eligible participants received an invitation message containing information regarding the nature of the study, and a link that led them to participate in the study. Once the informed consent form was given by clicking on the "accept" function, participants had access to the DCE survey.
Participants were given two weeks to answer the survey. After two weeks, the link to the non-answered questionnaire was inactivated and sent to new solicited participants until the sample size had been reached.
The preference data were codi ed on Excel for analysis using a conditional logit model (SAS the LOGISTIC procedure, release 9.4) to estimate the effects of each attribute on the preference of the participants. The signi cance level was xed at 0.25% (36).

Stage 1: Attribute Development
Attribute identi cation methods resulted in the identi cation of ten categories that could potentially become attributes of a test that should in uence, according to the respondents, the decision to expand the list of conditions screened for in a NIPS-based prenatal screening program. Each category contained multiple decisional points suggesting potential levels for each attribute. The source of each category is provided in Table 1. This step showed that pregnant women and decision-makers have many common preoccupations besides speci c ones.  Indeed, several preoccupations were shared by both groups, notably regarding the likely need to select which conditions in the list of chromosomal anomalies that can be screened for by NIPS, the type of information resulting from the test that should be transmitted to parents, the test performance, the parent's out-of-pocket costs related to the test, and the level of certainty of the test result. Preoccupations regarding these categories could differ between pregnant women and decision-makers. For example, pregnant women desired to avoid excluding conditions, while decision-makers were concerned with limiting the choice to conditions whose disability can be prevented or mitigated thanks to a medical intervention. Sensibility and speci city (false negative rate and false positive rate) issues were raised by all four decision-makers, but not by pregnant women. Decision-makers, unlike women, did not raise the question of gestational age to receive the result of the test, and the question of 'geographical access to the test'. They tended to place greater emphasis on the complexity of the test procedure and discussed the issue of stress that the test might cause to women. Details are provided in Table 2.
Page 11/29   Table 3 Potential attributes and levels for consultations and focused group discussion Levels 2. the con rmatory test is done in a regional hospital 3. the con rmatory test can only be done in a hospital located in a city center Attribute selection and framing were based on consultations with two decision-makers and a pregnant woman, and a focus group with one decision-maker (who participated in the in-depth interview and who agreed to participate in future studies) and three pregnant women (who had not participated in the rst interviews). The consultations were done by email. Participants to the consultation process proposed to eliminate the two attributes mentioned only by decision-makers, test speci city and test sensibility, since the medical terms might not give much signi cant meaning to pregnant women and were di cult to understand by a layperson. This was con rmed by the focus group discussion.
The consultations and the focus group discussion also led to the rewording of some attributes and levels in order to facilitate the comprehension of the questionnaire (for example, 'test procedure' became 'test su ciency'). The process reduced the questionnaire to eight attributes. Five attributes had two levels and three attributes had three levels. Amongst the attributes, the levels of 'cost related to the test' was built based on what women said about how much out-of-pocket money they were ready to give for the test, as well as the real cost of the NIPS test presently offered in the public health care system. Details are provided in Table 4.
Page 16/29 Table 4 List of attribute and attribute levels for the pilot project Attributes Levels Explication 1. Conditions to be screened 1. can be treated or lead to a termination of pregnancy A test can detect as many conditions as possible, provided that in case of a positive result, medical intervention is then possible.
2. can be treated or lead to termination of pregnancy and should not be rare A rare disease is de ned as a condition that affects less than one in 200,000 individuals. This test would therefore make it possible to detect diseases that are rarer than Down syndrome, which affects 300 children out of 200,000 births.
2. Test performance (degree of accuracy of the test result) 1. known In a few cases, the result of a screening test is incorrect. When the percentage of the error is known, the mother can be told what the probability is, that a second test, called a con rmatory test, which is rarely wrong, will con rms or reject the rst test result.

uncertain
In a few cases, the result of a screening test is incorrect. When the percentage of error is uncertain, the probability that a second, con rmatory test, which is rarely wrong, will con rm or invalidate the rst test result cannot be speci ed. An uncertain result is common for rare diseases 3. Moment at gestational age to obtain the test result 1. before or at the third prenatal visit The result is communicated at the latest during the third prenatal visit, around the 24th week of pregnancy 2. before or at the second prenatal visit The result is communicated at the latest, during the second prenatal visit, around the 18th week 4. Degree of test result certainty to the severity of the disability 1. the child is certain to have a severe physical and/or intellectual disability that will affect the child's quality of life The result may detect a physical or intellectual problem that will lead to a severe disability that will affect the child's quality of life 2. the child may have the disease. However, having the disease does not necessarily mean that the child will have a severe physical and/or intellectual disability The result can detect an intellectual or physical problem but does not indicate the severity of the disability. The information is about the possibility that the child may have a disability and the medical consequences of the disability which may require treatment.

the risk of disability, its medical and social implications
The information is about the possibility that the child may have a disability, the medical consequences of the disability, and the social impact of the disability on the life of the child and family.  Figure 2 illustrates an example of a choice task in the DCE survey. An example of the full survey is available on request.

Stage 4: Results of the pilot-test
The pilot project conducted with 33 pregnant women (aged from 19 to 37 years) was completed within 10 weeks (from 5th July to 12th September 2021). Over a total of 115 invitations sent, 68 women agreed to participate in the study. Among which, 36 full responses were received (53% response rate).
The pilot project revealed that it took an average of 11 minutes for participants to complete the questionnaire (2 minutes in minimum; 49 minutes in maximum).  This nal DCE questionnaire contains seven attributes (5 attributes with 2 levels, one attribute with 3 levels and one attribute with 6 levels) (Table 6).

Page 22/29
include in the data used to support their decision, comparable information provided by groups that can have different preoccupations, but whose concerns, from a social perspective, are equally relevant.
The instrument shows similarities in its attributes with other DCE instruments built for use in studies on prenatal screening of the fetus (37)(38)(39)(40). All instruments have attributes related to the level of information provided by the test results, to the time, in gestational age, to receive the results of a screening test, and to the test su ciency, i.e., the impact of the test on the need of further invasive/non-invasive procedures to con rm a screening result. All instruments have a 'test performance' attribute, presented sometimes in different terms as 'detection rate', 'accuracy rate' (37)(38)(39)(40). Unsurprisingly, this concern was also shared by decision-makers involved in the construction of our instrument but described in terms of 'false-negative and 'false-positive' rates.
However, two dimensions that are present in our DCE instrument, are not found in other DCE instruments: one regarding which conditions that should be screened and one regarding which certainty level of the test result regarding the severity of a disability could be accepted. The reason for the discrepancy is probably due to the fact that other DCE instruments focus on the use of tests to screen for common chromosomal anomalies, for which the performance of the test is high. Our instrument targets potential chromosomal conditions that are not screened for but could be added to a screening program. The performance of the detection test for these conditions tends to be lower. These conditions tend also to lead to a wide range of phenotypes, hence of the severity of disability (37,39,(41)(42)(43)(44).
Moreover, the construction of the DCE instrument shows that a consensus can be reached on a nal version, even if some attributes have been proposed by only one group. This probably re ects the respect and interest of all participants for preoccupation with some attributes that were of little concern to them before the study, but whose evocation has aroused an interest. To our knowledge, only one DCE instrument, in the eld of pharmacy subsidy decisions, might have been applied to both patients and decision-makers (45). Yet, this paper contains some uncertainties whether both the public and health decision-maker/experts were effectively involved in the identi cation of the questionnaire attributes. This paper doesn't provide information regarding if and how a consensus on the DCE instrument attributes between both groups have been looked for (45, 46).

Conclusion
This study shows that it is possible to develop a DCE instrument to elicit values for an intervention that re ects the demand and the supply sides of the health care systems. By expanding the range of stakeholders involved in the valorization of an intervention, such an instrument might contribute to the efforts deployed to address the societal value of an innovation. Yet, a DCE study is resource-consuming and might give results that are di cult to explain to the target audience. The acceptability of this approach is therefore an issue. Research on these added values of DCE in the world of economic evaluation is still warranted. Ethics approval for this study was obtained from Comité d'éthique de la recherche du CHU de Québec-Université Laval (project 2020-4877), and permission was granted to enroll participants at the CHUL hospital in Quebec City (Canada).