Effectiveness will be examined by use of a parallel superiority randomized controlled trial (RCT) with a baseline, post-test after four weeks and follow-up assessment after six to eight weeks. The RCT includes two groups: an experimental group who play the serious game ‘You & I’ and a control group who will be placed on a waitlist. The method of this study is reported according to the Standard Protocol Items: Recommendations for Interventional Trails (see Additional file SPIRIT checklist).
A total of 172 adults with MBID, aged 18 years or above, will be recruited from the population of four Dutch care organizations that are specialized in disability care (Bartiméus, Ons Tweede Thuis, Cordaan and ASVZ). Furthermore, the possibility to participate will be mentioned on for example the website ‘www.socialerelatiesenict.nl’, on social media and at various meetings. A diagnosis of MBID (IQ ranging between 50 and 85) needs to be reported for instance by the care organizations where the participant receives care. Inclusion criteria are having basic computer operation skills and having access to a computer with internet connectivity. Adults need to give a written consent for participation in the study and, if necessary, also their legal representative. Excluded from participation in the study are adults who are deaf and/or blind or adults who have serious mobility impairments for whom computer operation is not possible without aids.
Sample size calculation
Linear mixed effect modelling with two conditions and three repeated measures will be conducted to analyze the effects on mentalizing abilities and stress regulation. The sample size is estimated based on previous studies measuring the mentalizing ability perspective taking by using the subscale Perspective Taking (PT) of the Interpersonal Reactivity Index (IRI) and stress regulation using the Lifestress Inventory (LI) among people with intellectual disabilities. With the means used from both measures (see for means , ), an desired power of 0.90 and an alpha of.05, it is estimated that around 144 participants are needed, as calculated in GLIMMPSE . Because there are three assessments, a drop-out of 20% will be taken into account. Thus, 172 participants will be recruited and randomized into two groups of approximately 86 participants in each group.
Study procedure and randomization
Individuals who sign up for the research via both the care organization as well as via the internet or via any other route and meet the inclusion criteria will receive an information brochure. Persons who want to participate in the study are asked to sign the consent form and return it to the researcher. In case of legal incapacitation, the legal representative of the participant is asked to sign and return a consent form on behalf of the participant.
Data collection for each participant takes 12 to 14 weeks. Participants will be assessed at: baseline (T0), post intervention (T1, five weeks after baseline), and follow up (T2, six to eight weeks after T1). During all three assessments, participants fill out a set of digital questionnaires (for all measurements, see Measures). A software program (e.g. Qualtrics or Survalyzer) will be used to gather data through the digital questionnaires. Completing the questionnaires will take up to 90 minutes per assessment. During all assessments, an independent researcher is present to assist the participants with completing the questionnaire at home or in their care home. The researchers will follow a standard protocol on how to assist the participants.
After informed consent and baseline assessment, blind for intervention, participants will be individually randomized into two groups using stratified randomization in combination with block randomization with varying block sizes of 4 and 6. To balance contextual factors, randomization will be stratified with regard to care organization. An independent researcher will produce the allocation schedule using a computerized random number generator and afterwards conceal the schedule for the researchers. Blinding is only possible by the baseline assessment and after that, both participants and the researchers know which group participants have been assigned to.
After randomization, participants within the experimental group will be offered the serious game ‘You & I’, while participants within the control group will be placed on a waitlist. Participants from the experimental group will be asked to play the serious game on their own computer device at home or on a computer device of their care home. They have to complete eight gaming levels within four weeks, playing the game twice a week. To remind the participants to play the game, they will receive an impersonal email or text message on their phone twice a week asking them whether they already played the game. Anonymous digital game statistics will measure the compliance of the participants (how often the computer game has been completed). After four weeks, post intervention assessment is administered and six to eight weeks later, the participants complete follow-up assessment. Participants from the control group can play the serious game after they completed follow-up assessment.
[Add Figure 1. Overview of the study timeline about here]
The intervention is a serious game called ‘You & I’ that focuses on the improvement of mentalizing abilities, including the regulation of stress. The second and last author in collaboration with adults with MBID and healthcare professionals developed the serious game. The serious game ‘You & I’ is based on attachment theory , the practice-oriented book ‘Mentalization in clinical practice’ by Allen, Fonagy & Bateman  and the practice-oriented book ‘Mentalization can be learned’ (in Dutch: ‘Mentaliseren kan je leren’) by Dekker- van der Sande & Sterkenburg . The participant with MBID can play the game independently on a tablet or computer.
The serious game revolves around a main character called Mo, who the player follows throughout the game by watching videos. In the first level, the player learns that Mo is sad because he misses his friend Emily, who moved to the United States. He decides to visit her and travel to the United States. The player will follow Mo on his adventure, while he leaves his house, takes the bus, the airplane and finds his way through a foreign country to finally be able to visit Emily.
The game consists of eight gaming levels, which will take about 30 to 45 minutes to complete. The participant is asked to play the game twice a week, completing one level every time. Each level has the same structure consisting of eight different elements. That is, videos following Mo’s journey, multiple choice questions, an emotion picture game, a stress measurer and a game about stress. The gaming levels cover different domains of mentalization, as described by Choi-Kan & Gunderson . Table 1 provides an overview of the themes and the domains of mentalization that are covered in each particular level. The first six gaming levels each cover a different domain of mentalization and levels seven and eight are so-called ‘booster levels’, implementing and repeating all domains of mentalization. By integrating the different domains of mentalization in the levels of the serious game, the player will improve its mentalizing abilities and learn how to cope with stress better.
All data will be collected through computerized assessments at baseline, post-intervention and follow-up assessment. Participants can fill out the digital questionnaires at home or in their care home. When needed, participants will receive support of an independent researcher, who will be present during all assessments and who will follow a standard protocol on how to assist the participants. Figure 2 provides an overview of the measures and time of assessment.
Minimal Dataset (MDS, T0)
To measure demographic variables, the minimal dataset (MDS) ‘Basic MDS’ and ‘Basic MDS for adults with an intellectual disability’ will be used, including the Personal Wellbeing Index—Intellectual Disability (PWI-ID) . The MDS is a set of questions on demographic variables for everyone who collects data of persons with intellectual disabilities. The MDS focuses on questions in the following domains: personal factors, personal development and personal well-being. The questionnaire consists of 32 items, measuring e.g. gender, age and intellectual functioning.
Social Validity Scale (SVS, T0, T1)
The SVS  is a questionnaire consisting of 15 questions measured on a 5-point Likert scale to assess the desirability, applicability, clarity and efficiency of the intervention procedure. In this study, the scale will be used as described by Janssen, Riksen-Walvraven, and Van Dijk  and Jonker, Sterkenburg, and Van Rensburg . During baseline assessment participants answer questions concerning their expectations of the serious game ‘You & I’ at the post intervention assessment questions concerning their experiences with playing the game.
Autism Spectrum Quotient (AQ–10, T0)
The AQ–10  measures the degree to which adults with average intelligence exhibit autistic traits. The self-report questionnaire consists of 10 items measured on a 4-point Likert Scale with scores ranging from definitely agree (1) to definitely disagree (4). The first items of the measure are: ‘I often notice small sounds when others do not’ and ‘I usually concentrate more on the whole picture, rather than the small details.’ Within a normal developing population, the ASD performs well at discriminating between individuals with and without a clinical diagnosis of autism spectrum disorder . The AQ–10 has not yet been used for adults with intellectual disabilities. Therefore, the items were adapted for persons with MBID. The adaptations were made by three authors (SW, MW, PS) and checked by collaborating researchers with MBID to align with our target group.
The primary outcome measure of this study is mentalizing abilities. Several questionnaires will be used to measure mentalization, each measuring a different component of mentalization i.e. reflective functioning, perspective taking, emotion recognition and the attribution of mental states. No specific effect is expected and therefore, the aggregate of the measures is tested to investigate what components are affected by the intervention.
The Reflective Functioning Questionnaire (RFQ, T0, T1, T2)
The RFQ  is a brief self-report screening measure of mentalizing abilities. It consists of 8 items measured on a 7-point Likert scale with scores ranging from strongly disagree (1) to strongly agree (7). The first three items are: ‘People’s thoughts are a mystery to me’, ‘I don’t always know why I do what I do’, and ‘When I get angry, I say things without really knowing why I am saying them’. Psychometric properties are good in a normal developing population and in patients with personality disorders. For the purpose of this study, the measure was adapted for adults with intellectual disabilities by removing unnecessary wording and simplifying concepts. The adaptations were made by the first three authors and checked by collaborating researchers with MBID. Moreover, eight experimental items from the RFQ–54 were added to the questionnaire. The instrument was translated to Dutch by the second author. Then, it was translated back to English by the last author. Where necessary adjustments were made. Any ambiguity was discussed in mail conversation with the developers of the instrument. Therefore existing psychometric property data did not apply.
Radboud Faces Database (RaFD, T0, T1, T2)
The RaFD  is a set of pictures depicting different emotional expressions and is used to assess emotion recognition as a part of mentalization. Participants have to view color photographs of unfamiliar faces portraying ten different Caucasian and Moroccan adults each displaying five emotions (anger, fear, happiness, sadness and neutral). A selection of 50 photographs has been made based on the percentage of agreement on emotion categorization, mean intensity, mean clarity, mean genuineness of the emotion and mean valence of the photograph . The pictures include averted gaze orientations (left and right) as well as direct gaze orientations (frontal). Participants have to indicate for each photograph which one of five emotions the adult depicts. The RaFD has good psychometric qualities in a normal developing population with an average expression agreement between chosen and targeted emotions of 82% (median 88%, SD 19%) .
Subscale Perspective Taking (PT) of the Interpersonal Reactivity Index (IRI, T0, T1, T2)
The IRI  is a multidimensional tool measuring interpersonal reactivity. The self-report questionnaire consists of 28 items measured on a 5-point Likert scale with scores ranging from does not describe me well (1) to describes me very well (5). The measure has 4 subscales, each made up of 7 different items. In this study, only the PT subscale will be used. The PT subscale measures the tendency to take the psychological point of view of others. The first three items are: ‘I sometimes find it difficult to see things from the “other person’s point of view’, ‘I try to look at everybody’s side of a disagreement before I make a decision’ and ‘I sometimes try to understand my friends better by imagining how things look from their perspective’. The reliability of the subscale is adequate with Cronbach’s α of.73 . A modification of the subscale has previously been used in research on adults with moderate or mild intellectual disabilities, which also indicated adequate reliability for this population with Cronbach’s α of.71 .
Frith Happé Animations Test (T0, T1, T2)
The Frith-Happé Animations Test is added because this measure has been previously used with children with intellectual disabilities . It is a non-verbal task to measure mentalizing abilities and therefore it is a good addition to the other verbal questionnaires. The Frith Happé Animations Test  consists of a series of computer-presented animations, lasting 34–45 seconds each. All animations feature one large red and one small blue triangle moving around the screen. There are three types of animations. First, Theory of Mind (ToM) animations in which it is suggested that the triangle anticipates or manipulates the ‘mental state’ of the other. Second, goal-direct action (GD) animations in which the interaction between the triangles can be described in terms of behavioral interaction. Third, random (Rd) animations in which the triangles purposelessly move around without reference to interactions, goals or intentions.
After each animation participants were asked: ‘What was happening in the animation?’ Verbal descriptions are recorded and scored for complexity of mental state terms used (i.e. intentionality; 0–3) and accuracy of the answer given (i.e. appropriateness; 0–2). Participants are presented with two practice animations (GD and ToM) to ensure they understand the task.
The secondary outcome measure of this study is stress regulation.
Lifestress Inventory (LI, T0, T1, T2)
The LI  is a 30-item self-report questionnaire which can be used to measure general worry, negative interpersonal interactions and competency concerns. Participants are first asked to indicate whether they have experienced a stressor. If they do not, participants move on to the next item. If they do, they select one of four answers to indicate the impact of the stressor, ranging from no stress (1) to a great deal of stress (5). The first three items are: ‘Do people treat you as though you are different?’, ‘Have you been getting on with your partner/girlfriend/boyfriend?’ and ‘Have you heard people you know arguing?’. The LI is reliable for administration of people with ID, with a Cronbach alpha of.85 .
Perceived self-efficacy scale (stress, T0, T1, T2)
This is a short 9-item questionnaire which can be used to measure perceived self-efficacy regarding stress regulation. The questionnaire is designed by the researchers of this study using Banduras guide for constructing self-efficacy scales  and is specifically focused on the skills that have been learned in the serious game ‘You & I’. Self-efficacy is concerned with people’s expectations of executing a particular skill, in this case stress regulation. We expect that if people are aware of actions that have the effect of regulating stress, this will lead to a better stress regulation. Participants are asked on a scale from 0 (not at all sure) to 10 (very sure) how certain they are about how they can know, feel and cope with stress. The first three items are: ‘Feel in my body when I have stress’, ‘Deal with stress well’ and ‘Know that I have stress’.
[Please add Figure 2 about here]
All statistical analysis will be conducted using SPSS version 24.0. Descriptive statistics will give insight in the characteristics of the participants. Before analyzing, outliers will be checked and, if necessary, winsorized and partial intention-to-treat analysis will be performed. Demographic variables are used to test for differences in baseline characteristics between the experimental and control group and added as covariate if differences are found. For social validity, average item scores are reported.
Primary and secondary outcome measures of the study (i.e. mentalizing abilities and stress regulation) will be assessed using linear mixed effects modeling. With Subject at the highest level and Group and Time and the Group x Time interaction will be entered as fixed effects, a mixed model fits in SPSS. Furthermore, there will be controlled for compliance through anonymous digital game statistics (how often the computer game is completed) and care-organization as the stratifying variable.
Data Management and monitoring
Data will be collected using online survey software. Computerized data will be stored on a secured server of the Vrije Universiteit Amsterdam. The participant’s privacy is guaranteed by assigning a unique identification number to every participant. Data will be processed using these identification numbers. All researchers who will work with the research data will sign a non-disclosure agreement, stating they will not share personal details of participants with a third party. The handling of the data will comply with the General Data Protection Regulation (GDPR). A data management plan was submitted and accepted by the funding organization of the study (ZonMw), project number 845004004. This study is also embedded in the Amsterdam Public Health (APH) research institute. The quality committee of APH offers a handbook to safeguard the quality of the research and performs random audits.
Ethical approval was given by the Medical Ethics Committee of the University Medical Center Amsterdam location VUmc, the Netherlands (METc VUmc 2018.007, NL60353.029.17) and the Institutional Review Board of the Faculty of Behavioral and Movement Sciences of the Vrije Universiteit Amsterdam (VCWE–2017–171). Potential future changes to the study will be proposed to the Medical Ethics Committee as amendments, and will be described and discussed in publications of this study hereafter.