Design and implementation of a replication study: The Music for Autism (M4A) binational assessor-blinded randomised crossover trial

doi:10.21203/rs.3.rs-2478719/v1

Many autistic children suffer from social communication problems, reduced participation, and mental health issues. Music therapy has beneficial but heterogeneous effects; its mechanisms are incompletely understood. The Music for Autism (M4A) trial aims to replicate and expand a previous trial examining brain mechanisms and clinical outcomes of music versus play-based therapy for autistic children. This paper presents M4A’s design and implementation; we examine feasibility of this replication trial based on the first wave of recruitment. M4A is a crossover randomised controlled trial currently conducted at two sites (Bergen, Norway; Vienna, Austria). Children aged 6-12 years diagnosed with autism spectrum disorder will be randomised to a sequence of weekly individual music and play-based therapy (3 months for each intervention; 3-month washout period). Outcomes assessed before and after each intervention period include communication (blinded, assessed by teachers); functional brain connectivity (from functional magnetic resonance imaging, fMRI); and further behavioural and biological outcomes. The planned total sample size of 80 will ensure adequate power. Recruitment in the first wave (14 randomised) was below expectations. Baseline characteristics were similar to the previous study, but some variables (severity, functioning) were difficult to assess and compare. A range of functioning levels were included. Interventions were well accepted; fidelity was adequate. Timing of interventions and assessments was challenging when blinded assessors were hard to reach. Blinding was successful. Movement in fMRI was an issue in some children, but we developed preparation strategies to help also low-functioning children to successfully complete the scans. No study-related adverse events occurred. M4A has a strong design, appears feasible, and promises important new insights into the mechanisms and outcomes of music therapy for ASD. Multinational replicability of controlled trials of complex psychosocial interventions in ASD combining clinical and brain imaging outcomes can be challenging and requires careful planning.

Trial registration: Clinicaltrials.gov NCT04936048

* AGr and MR contributed equally to this work and share first authorship. CG, GS, and KS contributed equally to this work and share last authorship.

Health sciences/Diseases/Psychiatric disorders/Autism spectrum disorders

Health sciences/Biomarkers/Predictive markers

Biological sciences/Psychology/Human behaviour

Autism spectrum disorder (ASD) is a lifelong neurodevelopmental condition, characterised by persistent challenges in social interaction and communication, as well as restricted and repetitive activities, interests, and behaviour. Although it affects up to 1% of the population¹, its impact on the individuals, their families and communities, and on the healthcare system and society, is still underestimated. Up to 70% of autistic individuals are diagnosed with comorbid conditions such as cognitive impairment, depression, attention deficit hyperactivity disorder (ADHD), anxiety, or epilepsy¹. This is reflected in high costs of healthcare services, informal care, lost productivity², and even reduced life expectancy compared to neurotypical individuals³. Limited understanding of the underlying pathophysiology and the high variation within the autism spectrum also contribute to a situation that leaves affected individuals and healthcare professionals with limited treatment options and research with considerable gaps to be addressed. Behavioural interventions have traditionally addressed core symptoms, but this focus may not adequately reflect the real and complex mental health needs of people with autism⁴.

The notion of a connection between autism and music is as old as the first reported cases of autism⁵, and music has been used as a therapeutic tool for many decades⁶. However, systematic and robust scientific investigation of the effects of music therapy (MT) in this population began only recently⁷. Considering that music is regarded as a "social art"⁶ and many challenges for individuals with autism are found in the social domain, possible effects of MT in developing social and emotional skills have been assumed. Based on 26 randomised controlled trials (RCTs) conducted since the 1990s⁸, MT is associated with beneficial outcomes in global improvement, symptom severity, and quality of life⁸, but a large RCT suggested that it may not change core symptoms as measured with diagnostic tools⁹. Exploratory subgroup analyses¹⁰ have suggested that MT might be most beneficial for younger and lower-functioning individuals; this may be consistent with a view of music as a more fundamental communication medium than language⁷. Moving beyond clinical observations, a better understanding of the working mechanisms of MT may be achieved by connecting clinical outcomes with neuroscientific measurements, such as resting-state functional connectivity from magnetic resonance imaging (fMRI)⁷. Only one RCT to date has combined clinical outcomes and functional brain connectivity¹¹; there is a need for replication and expansion of this knowledge.

Various alterations of functional connectivity between different areas of the brain are known in ASD. Reproducible alteration patterns include hypoconnectivity in the sensorimotor cortex and a hyperconnectivity in the prefrontal and parietal cortex¹². Earlier assertions of long-range cortical hypoconnectivity have been replaced by more idiosyncratic descriptions¹³. Adding a developmental perspective, in prepubescent children, both short- and long-range connectivity may be stronger in ASD than in typically developing children¹⁴. Variations in methodology¹⁵ and limited sample sizes¹⁶ limit replicability. In MT, Sharda et al.¹¹ chose 6 target seed areas (bilateral Heschl’s gyrus, HG; inferior frontal gyrus, IFG; temporal pole, TP) because of their relevance to communication and based on an earlier study of sung versus spoken word perception¹⁷. A seed-based connectivity analysis suggested that MT, compared to a non-music play-based therapy (PT), increased connectivity between left HG and subcortical and striatal regions, and reduced connectivity between the right HG and visual areas. Connectivity changes were correlated with improvements in communication scores rated by parents. The findings can be interpreted as reduced hypo- and hyperconnectivity, respectively, although such interpretation is limited by the lack of a neurotypical control group. Independent replication is warranted.

The overall goal of M4A is to replicate and expand Sharda et al.’s findings by investigating if 12 weeks of intervention through MT, compared to PT, improve social communication skills and other relevant clinical outcomes in children with ASD, and if this is accompanied by a change in functional brain connectivity and other biological and behavioural outcomes. M4A uses the same population, interventions, and outcomes as Sharda et al., with a larger sample size and in different countries, but adds further outcomes and potential mechanisms: cultural participation (not examined previously, but relevant in light of changing views of autism as a trait rather than a disorder^8,18); chronic stress (a significant but under-researched issue for people with ASD⁴; known to impede learning¹⁹; and modifiable through music²⁰); structural brain changes (grey matter volume, GMV, and white matter volume, WMV; modifiable through psychological and biological therapies²¹ and musical training²²); predictive processing (social action and music; indicating brain function differences between people with ASD and neurotypical individuals^23,24 and explaining pleasure from music²⁵); microbiome (correlating with behavioural alterations; either causal or mediated by dietary preferences^26,27). Prediction tasks and microbiome data collection were added after the initial proposal for funding and are exploratory add-on projects. Detailed hypotheses for the main trial are described in Supplementary methods. The aim of the present paper is to describe the M4A trial protocol and to examine its feasibility and risks, such as slow recruitment; treatment infidelity or inconsistency; unreliable assessment; and unforeseen risks; based on the first cohort recruited.

Study design, participants, and settings

M4A is an assessor-blinded crossover RCT (Fig. 1) comparing MT to a structurally matched PT intervention, combining behavioural and neuroscientific outcome measures (Fig. 2). Eligible children are aged 6-12 years and have an ASD diagnosis from a licenced health professional with an ICD code²⁸, ideally supported with scores of standardised tools (ADOS²⁹, ADI-R³⁰) and assessment of intelligence quotient (IQ) or level of ability. Children with individual MT during the last 6 months; generalised epileptic seizures during the last 12 months; more than 12 months of cumulative music training; or conditions precluding fMRI scanning (such as metallic or electronic implants, claustrophobia, or persistent problems in complying with scanning procedures) are excluded. Participants are recruited in two cities (Bergen, Norway; Vienna, Austria) by advertising through social media and reaching out to relevant groups such as parent associations, schools, clinicians, and special educators. Baseline assessment includes demographic data, behavioural variables, MRI scans, and other biological data (Fig. 2). Participants are randomised to a sequence of interventions (MT-PT or PT-MT). The first intervention takes place over 3 months, followed by a washout period of 3 months, and then by 3 months of the second intervention (Fig. 1). When possible, enrolment is scheduled so that interventions will take place during the school year and washout periods during school holidays. We expected around 27 participants randomised each half year in both sites combined. Outcome assessments are conducted before and after each intervention period, leading to a total of four time points (Fig. 1, 2). Changes to the protocol since the initial version are listed in Supplementary Table S1.

Interventions

Locations (a dedicated room at each site) and materials (matched to those in Sharda et al. as closely as possible) were standardised to ensure consistent implementation in Bergen and Vienna. Both interventions consist of 12 weekly one-on-one sessions, 45 minutes each, conducted in the same setting by a licensed music therapist. Although we initially planned to conduct MT and PT in accordance with manuals from Sharda et al., it became necessary to further specify and develop them to provide sufficient detail for replication across therapists and sites. The treatment manual and the fidelity assessments will be further described in separate publications. Briefly, both MT and PT use a developmentally oriented approach with the overall aim of improving quality of life³¹ and targeting creating a shared experience; building meaningful relationships; and fostering self-expression. A varied set of activities, combining therapist- and child-led interactions, target common goals including sensory integration, social communication, and emotional development. Children can choose 4 activities per session using a visual schedule to facilitate communication and provide structure to the sessions³². MT uses rhythmic cues, musical instruments (e.g. piano, djembe, xylophone, ukulele), songs, and stories accompanied by songs. PT is designed as a play-based active comparison condition to control for factors such as support, therapist attention, positive expectancies, and emotional engagement. It uses verbal interaction, toys (e.g. Lego, finger puppets, Play-Doh, puzzles), and stories (when possible, the same as in MT), without a musical component. Treatment fidelity^33,34 is monitored and ensured via therapists’ reports, supervision meetings, and video recordings of sessions. For every child, three videos from each intervention arm (first, middle, and last session) are assessed by two independent raters. The fidelity checklist covers dimensions such as dose (number of sessions); structure (number and type of activities); content (use of musical or verbal reinforcements, use of feeling cards); and processes (therapist’s attunement, therapist’s responsiveness). In addition, child engagement and interaction levels are investigated. Differentiation between MT and PT is analysed by comparing the amount of musical and verbal reinforcement used.

Outcomes

Primary clinical outcome

Social communication after 3 months (end of each therapy) as assessed by a blinded assessor using the General Communication Composite (GCC) standard score of the Children’s Communication Checklist (CCC-2).^35,36 The CCC-2 measures aspects of pragmatic communication with 70 items across 10 subdomains (speech, syntax, semantic, coherence, inappropriate initiation, stereotyped language, use of context, nonverbal communication, social relations, and interests) and has shown high interrater reliability and internal consistency.³⁵ The GCC standard score is standardised to M=100 (SD=15), with higher scores indicating better social-communication skills (reversed from raw scores). Whereas Sharda et al. relied on parent reports for this outcome, where attempted blinding was unsuccessful¹¹, M4A relies on a special educator/teacher who knows the child well. Success of blinding is verified at the last follow-up by asking the assessor about incidental unblinding.

Secondary clinical outcomes, measured at the same time points as the primary clinical outcome, include 2.) participation (Child and Adolescent Scale of Participation, CASP); 3.) family quality of life (Beach Center Family Quality of Life Scale); 4.) receptive vocabulary (Peabody Picture Vocabulary Test, PPVT; conducted by a health professional); 5.) symptom severity (Social Responsiveness Scale, SRS); 6.) adaptive behaviour (Vineland Adaptive Behavior Scales); 7.) chronic stress (hair cortisol); 8.) adverse events; and 9.) preference. Participation and quality of life are the two main secondary clinical outcomes because of their particular importance to participants. All secondary clinical outcomes except receptive vocabulary and chronic stress are parent-rated; for details, see Supplementary methods.

Brain connectivity of frontotemporal regions will be derived from resting-state fMRI, a measure of the functional connectivity between brain areas under a condition where no concrete task is performed. Six seed areas will be used as the main neuroscientific outcome. Seeds are anatomically defined regions of interest (ROIs) in Montreal Neurological Institute space for the left and right HG, IFG, and TP. These seeds anchor frontotemporal networks involved in language and communication and may be altered in ASD and modified by MT.¹¹ The timeseries for each of the seeds will be used to generate individual participant-level maps using whole-brain general linear models at baseline and after the interventions. First-level maps will then be entered into the second-level analyses. Results will be reported with a 5% significance level adjusted for multiplicity by familywise error rate. Z-scores of parameter estimates will be used to measure connectivity strength. Structural brain changes (GMV and WMV) will be assessed using voxel-based morphometry (VBM), derived from the anatomical T1 image which is acquired at the beginning of each scanning procedure. ROIs include the 6 seeds above as well as other areas identified in our previous review (cerebellum, superior temporal sulcus, temporo-parietal area).⁷ Scanning parameters were harmonised across sites (see Supplementary methods). The brain scan procedure requires children to be as still as possible for approximately 16 minutes inside the MRI scanner. To facilitate the procedure, children are shown movies during the scan – an entertaining movie (Tom and Jerry or a preferred movie) during the T1 scan, preparatory scans, and breaks between scans, and a specifically designed video³⁷ during resting state. To help families prepare, we provide parents with a movie showing the surroundings, the procedure, and the preparation for the scan; a sound file with the recorded scanner noise; and an individualised visual schedule for the visit (Supplementary Figure S1). On the day of the scan, researchers maintain a calm and patient attitude to increase the likelihood of cooperation and inform radiographers about any specific details concerning the child. After a successful scan, children receive a toy set of an MRI scanner (playmobil) as a reward. When needed, we schedule an additional in-person meeting at the scanning facility prior to the baseline scan to familiarise them with the environment and the researchers.

Additional exploratory outcomes added after the initial protocol include microbiome data and performance in behavioural prediction tasks (see Supplementary methods).

Sample size and power

Sharda et al. reported a mean difference of 4.84 and an effect size of d=0.34 (SD not reported but calculated as 4.84/0.34≈14.2) on the CCC-2. Our previous Cochrane review showed similar effect sizes³⁸. We therefore designed M4A to be powered for an effect size of d=0.34. We further assumed outcomes for MT and PT to be correlated with r=0.50. With a two-sided significance level of 5%, a sample of n=70 will be required for 80% power; to accommodate for attrition, we plan to recruit 80 participants. Similar power can be achieved for a range of reasonable effect sizes and correlations (Fig. 3). The same power can be reached with varying combinations of effect sizes, correlations, and numbers of participants with complete data; in contrast, a parallel design would require almost 300 participants (Supplementary Table S2). In summary, the sample size calculation for M4A appears realistic under a range of reasonable scenarios.

Randomisation

Computer-generated randomisation lists, with randomly varying block sizes of 4-6, separate for each country, are used to ensure both unpredictability and balance. The lists were generated by a researcher who does not have contact with participants and are stored centrally, concealed from clinical investigators, until a decision on inclusion is made. Once a participant is enrolled, the randomisation result is conveyed to clinical investigators through an online system (REDCap on NORCE servers).

Statistical analysis

All outcomes will be analysed on an intention-to-treat (ITT) basis; statistical models are described in Supplementary methods. Analysis for the present paper was pooled across intervention groups to maintain blinding of researchers. Baseline variables reported by Sharda et al. were pooled across groups using weighted means and pooled standard deviations for comparison with the present sample. Analyses were primarily descriptive and consisted of graphical analyses and summary statistics (absolute and relative frequencies, means, SDs). We examined the quality of fMRI scans using framewise displacement (FD)³⁹, an indicator of head motion, which is a common problem especially in children and clinical populations⁴⁰. Volumes exceeding a certain FD threshold (0.5mm³⁹ or other values⁴¹) can be downweighted or scrubbed to obtain usable data³⁹; longer passages of high movement can be removed. Analyses were conducted in R and Matlab.

Ethics

The study was approved by the Norwegian Regional Committees for Medical and Health Research Ethics (REK Sør-Øst D, 07 May 2021, reference 246145) and the Ethics Committee of the University of Vienna (21 April 2021, reference 00634) before the start of enrolment. Upon recruitment into the study, parents/caregivers of eligible participants provide written informed consent; children assent orally based on oral information and an invitation letter describing the study in simple terms. Terminating participation in the study is possible at any time during the process. All collected data are stored on a password-restricted server. Survey data are either entered directly by parents or teachers or independently entered twice by researchers if collected on paper.

Recruitment

Recruitment started in July 2021. The first participant was randomised on 20 September 2021. In the first recruitment wave (autumn 2021), a total of 41 participants (9 Bergen, 32 Vienna) were screened, of whom 27 were excluded before randomisation: 2 did not meet inclusion criteria; 18 did not consent; 7 were unable to complete an MRI scan. Fourteen participants (5 Bergen, 9 Vienna) were randomised. This was far below the original plan to randomise 27 participants each half year, meaning that 5-6 instead of 3 half-yearly waves may be required to reach the target sample size.

Baseline characteristics

Compared to Sharda et al., M4A participants were younger (9.07 vs. 10.25 years); proportions of boys (ca. 80%) and children with sentence-level speech (> 80%) were similar (Table 1). Not reported in Sharda et al., many M4A participants had a co-occurring medical condition or were taking medication (Table 1). Comparing socio-economic status (SES) was not straightforward: Sharda et al. reported annual income in Canadian dollars and subjective SES on a 10-point scale, whereas M4A asked about income (categories in national currency) and education years and level of both parents. An attempt to compare income across studies is shown in Table 1; however, data were not collected in Bergen, and some parents in Vienna may have misunderstood the item.

Autism symptom severity was similar across samples. ADOS scores from pre-existing clinical reports, available only for a subset of children in both samples (8/14 in M4A; 35/51 in Sharda et al.), were similar (14.5 vs. 15.34); SRS T-scores were almost identical (71.5 vs. 71.17, but note difference in SD), suggesting moderate symptom severity (moderate range: 66 to 75)⁴²; Vineland maladaptive behaviour scores were slightly lower in M4A (18.5 vs. 19.9; Table 1).

In terms of functioning, pre-existing IQ scores were available only in a small subset of M4A, but in a high proportion of Sharda et al. (5/14 vs. 49/51 for full-scale IQ; Table 1). Parent-reported Vineland adaptive behaviour scores, which may serve as an alternative measure of functioning, were often not scorable, partly due to missing items, which may occur more easily when using paper-based forms (as in Vienna) than electronic versions (as in Bergen). We therefore decided to also use scores of social communication and receptive vocabulary in this category, which were available in all or nearly all cases (Table 1). Of those measures in this group that produce a standard score (IQ, Vineland adaptive behaviour, CCC-2 GCC, PPVT-4), all were on average in a lower range in M4A (mean values ranging from 63 to 89) compared to Sharda et al. (77 to 107). A similar picture emerged from the scales using v-scores (9 to 13.4 vs. 13.37 to 15.58). Functioning level of autistic children is often difficult or impossible to assess with standardised IQ tests.⁹ However, when considering all available functioning measures for each child, 8 were low-functioning in at least one domain (at least one standard score < 70); 1 as high-functioning (at least one standard score ≥ 130); the remaining 5 as functioning in the normal range (all standard scores ≥70 and < 130).

Retention in the study; intervention completeness and fidelity

All 14 participants were retained in the study, started the first and the second intervention, and remained in the study until the last follow-up (Supplementary Fig. S2). In Sharda et al., the mean numbers of therapy sessions completed were 10.50 (SD 1.61) for those randomised to MT and 10.16 (SD 1.70) for those randomised to PT¹¹. In M4A, 11.36 (SD 1.28) sessions were completed for the first and 10.07 (SD 2.67) sessions for the second intervention. While many participants participated in the maximum number of 12 sessions, this number was much smaller in some participants, particularly for the second intervention (Fig. 4). Scheduling problems, possibly also compliance/motivation problems, were cited as causes. Rescheduling, in one case with the help of a new therapist, was attempted, but the number of sessions in these participants remained low.

While treatment fidelity will be reported in detail separately, the average number of activities in each session was around 4 as planned (MT: M=4.1, SD 0.99; PT: M=3.55, SD 0.75). Durations of each activity varied because some children insisted on continuing with one chosen activity, particularly in PT. The popularity of activities varied as well, with some activities being far more popular than others (e.g. piano in MT; Lego in PT; Table 2). This uneven distribution in M4A contrasts with Sharda et al. and indicates both children’s preferences and therapists’ willingness to follow these preferences.

Timing of interventions and follow-ups

Timing of interventions and assessments as calculated from the randomisation date are shown in Fig. 4. At baseline, the scan was usually first, followed by the parent and finally the teacher questionnaire. After randomisation, the first intervention usually started immediately. Some interruptions occurred due to school holidays and illness. Interventions were generally completed within the expected timeframe, with optional parent talks around the start and completion dates. For the follow-ups at 3 and 6 months, scans and parent questionnaires were often completed on time, with some exceptions. Teacher questionnaires at 3 months were more often delayed, likely because the time point coincided with school holidays. The second intervention usually started and ended in or near the expected time window at 6 and 9 months within the expected time window. At 9 months, some teacher questionnaires were again delayed, again coinciding with school holidays and, in Norway, also with a national school strike. Parent questionnaires were also somewhat delayed at the final follow-up, but most assessments were completed within the expected time window.

Completeness of outcome assessments

The CCC-2 and MRI scans, as the primary clinical and neuroscientific outcomes, were preconditions for randomisation; completion rates at follow-up varied from 11/14 (79%) to 13/14 (93%; Supplementary Fig. S2). In one case in Bergen, the 6-month scan had to be aborted due to insufficient compliance with scanning procedures. No usable data were obtained, and the child subsequently refused to participate in the 9-month scan. One child in Vienna also stopped participating in the MRI scans at 6 months but continued with the therapy sessions and all behavioural assessments. At 9 months, another child in Vienna refused to participate in the last follow-up scan and assessments, although the CCC-2 was still obtained. Having information on the child prior to the baseline assessments was vital so that steps could be taken to help them cope. Preparation for the scan was vital in order to ensure a positive experience for the families and compliance by children from the initial visit, especially as the procedure is repeated 4 times during the course of the study. Among the secondary clinical outcomes, completion rates for parent-rated outcomes varied from 10/14 (71%) to 14/14 (100%; Supplementary Fig. S2). One mother did not complete the questionnaires at baseline, but for all subsequent time points. Another mother refused to complete the last follow-up visit and questionnaires. Completion rates for the PPVT-4 ranged from 12/14 (86%) to 14/14 (100%). The number of biological samples (hair and stool) we were able to collect was low, ranging from 3/14 (21%) to 7/14 (50%) at each time point. With most of the participants being boys with short haircuts, and the baseline starting directly after the summer holidays, we were often unable to obtain usable hair samples. Some children resisted having their hair cut by researchers or even parents. Stool samples were also difficult to obtain. In Bergen, shortage of staff prevented collection of stool samples in the first wave; in response, we hired additional staff. In Vienna, one family did not consent to collecting stool samples; others experienced insufficient compliance by children. Prediction tasks were more challenging to conduct in Bergen than in Vienna due to staff shortage and technical problems. At both sites, the music prediction task was easier to complete for the participating children, due to the concept being more accessible and the tasks being shorter, whereas the action prediction task was more challenging. However, 7 out of 9 children in Vienna were able to complete both music and action prediction tasks. One child completed the music prediction task but failed to complete the action prediction task. In the Bergen sample, data collection for the prediction tasks could not begin before month 3. At month 3, all 5 participants completed both prediction tasks. At months 6 and 9, 3 out of 5 participants succeeded in completing both tasks. The question about the child’s preferred intervention at study end was answered for 13/14 participants.

Success of blinding

Out of 12 teachers who replied, 11 reported that they were not aware of the child’s allocation. One teacher described becoming aware that the child participated in music therapy, but not in which period this occurred. Thus, blinding teachers appeared to be successful.

Quality of baseline rs-fMRI scans

Some participants had low movement whereas others had shorter or longer periods with high movement (Supplementary Fig. S3). Participant 10 stopped after 477/750 volumes, but movement was low until then. Participant 8 had low movement (occasional peaks) until about volume 450. Participant 1 had two periods with low movement which may be analysed separately. The number of volumes with FD<0.5mm ranged from 436 in Participant 9 to 750 in Participants 2 and 14 (M=621, SD=110).

Adverse events

Parents of two children reported adverse events at 3 months; one of them was described as serious. Both were discussed with the Data and Safety Monitoring Committee and were found to be transient and unrelated to study procedures. No adverse events were reported at 6 and 9 months.

The M4A trial was designed to replicate and expand a previous trial of music versus play-based therapy for children with ASD. Here, we described the design and implementation of the M4A trial, focusing on replicability and feasibility as well as transparency of methods. Overall, we found this replication study feasible, but we also identified some important challenges.

Recruitment is challenging in all RCTs, and M4A is no exception. Population size and structure of autism services in the area both limit the number of participants that can be included. Bergen is a relatively small city. Vienna is a bigger city, but with more decentralised autism services. A challenge in M4A is that participation requires the ability to lie still in a scanner, which can be difficult for many children who would otherwise be eligible. Many families who were interested in participating were unable to do so because it was impossible to scan the children, thereby reducing the number of children who could be successfully recruited. Scanning procedures were greatly facilitated by preparing adequately beforehand. However, despite preparations and multiple visits to the scanner, in some cases it was not appropriate to scan the children as they were too distressed by the experience. Conversely, preparing for the scanning procedure enabled many children whose parents initially did not believe would be able to lay still to undergo scanning to succeed and even have a positive experience.

Besides recruitment, preparations for MRI scans also influence generalisability. Many studies exclude low-functioning autistic children, thus limiting generalisability of findings. Other studies might use sedation to obtain structural images. Low-functioning autistic children may be most in need of services and also most likely to benefit from MT⁷. It is therefore important to note that M4A was able to include such children and obtain non-sedated resting-state fMRI scans.

An additional set of challenges is associated with the multinational replication study. These include language barriers, the availability of assessments in each language (the CCC-2 in Vienna; PPVT-4 in Norway; and CASP and FQoL in both countries, were translated in-house), but also unexpected variations in scoring procedures and scales requiring adaptations for consistency.

Adequate timing of interventions and follow-ups in trials such as M4A provides logistical challenges as it requires the collaborative and timely effort of many people involved – therapists, parents, teachers, children, assessors, radiologists, and researchers. Parents and children are asked to complete a considerable number of assessments, usually spanning across two separate in-person visits (one for scanning and one for the receptive vocabulary and prediction tasks) as well as 5 online questionnaires to be completed by parents at each time point. Parents were generally highly motivated to collaborate as part of the trial, and although some assessment procedures were challenging for the children, in general they have been well tolerated and feasible. Teacher questionnaires were often delayed as they tended to match with holiday periods; additionally, teachers might be less invested than parents in participating in the trial. Compensation for their participation might help and has since been offered to teachers. While reporting delays in starting interventions is now a requirement in CONSORT⁴³, the actual timing of assessments in relation to both randomisation and interventions is rarely if ever reported in RCTs, although it may well introduce bias.

Strengths and limitations of the study design

An important strength of M4A is its randomised design, which allows for causal conclusions and is considered the gold standard design in evaluating intervention effects. The crossover design of M4A requires much fewer participants to achieve sufficient power and may help to ensure feasibility; additionally, individual preferences can be examined based on each participant’s experience of both interventions⁴⁴. However, crossover designs require additional assumptions which are difficult to assess, including chronicity/stability of the condition, reversibility of outcomes, and limited duration of intervention effects. Although ASD is a lifelong condition, associated behaviours are not necessarily stable over time, especially in children. The interventions are also hoped to lead to lasting learning, so that the second baseline may not be comparable to the first one; additionally, the relative benefit of each intervention may depend on the order in which they are received. However, with a relatively long wash-out period – equal to the intervention period – we hope to attenuate these problems. A practical consideration is the longer time frame for each participant. Families may be more motivated to participate in a study where they can receive both interventions, compared to a parallel study where they may not receive their preferred one. Overall, the high retention and completion rates in M4A confirmed this. However, for some participants, the longer study duration and higher number of assessments may also present a burden. A final limitation of the M4A study design is that power was calculated for the primary outcome, so that analyses of secondary outcomes, relations between biomarkers and psychosocial outcomes, and clinically defined subgroups will need to be considered exploratory.

Potential impact of the expected findings

As a replication study in a different cultural context with an increased sample size and additional measures, M4A may solidify the original findings as well as expand our understanding of MT’s biological underpinnings and their correlations with clinical outcomes. The existing base of empirical studies on the effects of MT on the brain is still small. In addition to the trial by Sharda et al. which M4A seeks to replicate, we are aware of a more recent RCT of drum training with autistic adolescents⁴⁵ and a non-randomised trial of improvisational MT for autistic children with no results published to date⁴⁶. Eight weeks of drum training compared to no intervention led to connectivity increases (right IFG and right dorsolateral prefrontal cortex) and improvements in rhythmic accuracy, but had limited effects on behaviour problems. Although it may be debated how training offered by drum tutors may differ from MT, the study is interesting in selecting rhythm as one component. MT as an intervention with few side effects^8,9 may be more accessible than other approaches that may be more invasive (e.g. brain stimulation, medication) or costly (e.g. early intensive behavioural intervention; biofeedback). Since the two interventions in M4A differ only in the use of music, its findings will also shed light on music-making as an “active ingredient.” Effects of music on the brain have more often been studied by comparing musicians to musically naive control participants⁴⁷. Given the quantitative and qualitative similarities between music-making in a therapeutic setting and music learning, convergent results would support the notion that music-making affects the brain under circumstances other than intensive training. Considering that MT has been applied for decades and that recent systematic reviews suggest positive effects on different clinical outcomes, further studies examining its effects on the brain are needed.

Since M4A includes children with different levels of cognitive functioning, the results could provide further insights into MT’s effects on different subgroups of ASD, which would potentially lead to a better selection of cases. The relations between clinically or neuroscientifically defined subgroups and outcomes may be complex, and additional analyses could be used to explore these relations. Data-driven methods can be used to discern patterns from the entirety of collected data by using machine learning. In addition, data-driven methods can be used to process data differently, i.e. with resting-state fMRI analysis methods like graph theory and dynamic functional connectivity.

In conclusion, this report on the design and implementation of a multinational replication study combining a strong clinical trial design with brain imaging suggests that such studies are feasible but challenging to implement and require careful planning to be successful.

Author contributions

AGr and MR coordinated recruitment, assessment, and interventions. AGu designed the behavioural music prediction tasks. AUK conducted and analysed assessments. KK analysed framewise displacement. NM designed the behavioural action prediction tasks. UMN advised on hair cortisol methods. MBP helped with recruitment of participants and advised on clinical assessments. MSG monitored treatment fidelity. RS advised on neuroimaging methods and analyses. BT conducted neuroimaging assessments. ICW designed and supervised the microbiome data collection. CG conducted statistical analyses. CG, GS, and KS designed the study and supervised its conduct. All authors contributed to drafting and critical revisions of the manuscript, approved the final version, and agree to be accountable for the work.

Acknowledgements

Aparna Nadig and Megha Sharda provided input on methodology as scientific advisors to the trial to help ensure appropriate replication of their original study. They provided materials for conducting and assessing fidelity of interventions as well as for conducting and analysing assessments.

Ann Heidi Båtvik, Camilla Figini, Håkon Albert Gåskjenn, Johannes Melhus Medlien, Thea Margrethe Tørrissen (Bergen), Jochen Haas, Luise Geissler, Luise Graichen, and Alva Chiara Lazar-Milian, and Lara Rump (Vienna) have helped with data collection.

Roger Barndon, Lars Ersland, Tor Erland Fjørtoft, Christel Jansen, Eva Øksnes, Trond Martin Øvreaas, Turid Irene Randa (Bergen), and Ronald Sladky (Vienna) were involved in setting up the sequencing parameters as well as preparing and scanning participants.

Mike Crawford (Chair), Paul Bassett, and Luise Poustka served as the independent Data and Safety Monitoring Committee (DSMC) and advised on study progress and safety. Jörg Assmus served as trial statistician and provided unblinded analyses to the DSMC.

Frida Anita Vangen, Johannes Melhus Medlien (Bergen), Dominik Denkmayr, Marlene Emminger, and Eva Madeleine Unterhofer (Vienna) provided interventions to study participants and their parents.

Kristin Whitehouse (Bergen), Jonathan Gärtner, and Helene Hodor (Vienna) served as user representatives to help ensure the relevance of the trial to autistic people and their relatives. Before the project started, user representatives discussed the project; confirmed the relevance of the primary outcome; commented on the importance of secondary outcomes; and gave advice on the relevance of intervention elements. During the project, they continued to give advice to the project team and helped with recruitment and dissemination.

Conflict of interest

The authors declare that they have no conflict of interest.

Data sharing

De-identified clinical and neuroimaging data will be made accessible for re-use by other researchers via platforms such as ENIGMA. De-identified clinical data will also be stored in a publicly available repository (Open Science Foundation, https://osf.io/).

American Psychiatric Association. Diagnostic and statistical manual of mental disorders (5th ed.). American Psychiatric Association: Arlington, VA, 2013 doi:10.1176/appi.books.9780890425596.
Rogge N, Janssen J. The economic costs of autism spectrum disorder: a literature review. J Autism Dev Disord 2019; 49: 2873–2900.
Hirvikoski T, Mittendorfer-Rutz E, Boman M, Larsson H, Lichtenstein P, Bolte S. Premature mortality in autism spectrum disorder. Br J Psychiatry 2016; 208: 232–8.
Fuld S. Autism spectrum disorder: The impact of stressful and traumatic life events and implications for clinical practice. Clin Soc Work J 2018; 46: 210–219.
Kanner L. Autistic disturbances of affective contact. Nervous Child 1943; 2: 217–250.
Alvin J. Music therapy for the autistic child. Oxford University Press: Oxford, UK, 1978.
Sharda M, Silani G, Specht K, Tillmann J, Nater U, Gold C. Music therapy for children with autism: investigating social behaviour through music. Lancet Child Adolesc 2019; 3: 759–761.
Geretsegger M, Fusar-Poli L, Elefant C, Mössler KA, Vitale G, Gold C. Music therapy for autistic people. Cochrane Database of Systematic Reviews 2022. doi:10.1002/14651858.CD004381.pub4.
Bieleninik L, Geretsegger M, Mössler K, Assmus J, Thompson G, Gattino G et al. Effects of improvisational music therapy vs enhanced standard care on symptom severity among children with autism spectrum disorder: The TIME-A randomized clinical trial. JAMA 2017; 318: 525–535.
Crawford MJ, Gold C, Odell-Miller H, Thana L, Faber S, Assmus J et al. International multicentre randomised controlled trial of improvisational music therapy for children with autism spectrum disorder: TIME-A study. Health Technol Assess 2017; 21: 1–40.
Sharda M, Tuerk C, Chowdhury R, Jamey K, Foster N, Custo-Blanch M et al. Music improves social communication and auditory-motor connectivity in children with autism. Transl Psychiatry 2018; 8: 231.
Holiga S, Hipp JF, Chatham CH, Garces P, Spooren W, D’Ardhuy XL et al. Patients with autism spectrum disorders display reproducible functional connectivity alterations. Sci Transl Med 2019; 11. doi:10.1126/scitranslmed.aat9223.
Uddin LQ. Idiosyncratic connectivity in autism: developmental and anatomical considerations. Trends in Neurosciences 2015; 38: 261–263.
Supekar K, Uddin LQ, Khouzam A, Phillips J, Gaillard WD, Kenworthy LE et al. Brain Hyperconnectivity in Children with Autism and its Links to Social Deficits. Cell Reports 2013; 5: 738–747.
Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 2020; 582: 84–88.
Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS et al. Reproducible brain-wide association studies require thousands of individuals. Nature 2022; 603: 654–660.
Sharda M, Midha R, Malik S, Mukerji S, Singh NC. Fronto-Temporal Connectivity is Preserved During Sung but Not Spoken Word Listening, Across the Autism Spectrum: Intact sung-word connectivity in ASD. Autism Res 2015; 8: 174–186.
Silberman S. NeuroTribes: The legacy of autism and the future of neurodiversity. Penguin Random House: New York City, NY, USA, 2015.
Ogawa S, Lee YA, Yamaguchi Y, Shibata Y, Goto Y. Associations of acute and chronic stress hormones with cognitive functions in autism spectrum disorder. Neuroscience 2017; 343: 229–239.
Wuttke-Linnemann A, Feneberg A, Nater UM. Music and health. In: Gellman M (ed). Encyclopedia of Behavioral Medicine. Springer Nature: Cham, Switzerland, 2019 doi:10.1007/978-1-4614-6439-6_101901-1.
Uscinska M, Mattiot AP, Bellino S. Treatment-induced brain plasticity in psychiatric disorders. In: Palermo S, Morese R (eds). Behavioral Neuroscience. IntechOpen: London, UK, 2019 doi:10.5772/intechopen.85448.
Gaser C, Schlaug G. Brain structures differ between musicians and non-musicians. J Neurosci 2003; 23: 9240–9245.
Haker H, Schneebeli M, Stephan KE. Can Bayesian Theories of Autism Spectrum Disorder Help Improve Clinical Practice? Frontiers in Psychiatry 2016; 7.https://www.frontiersin.org/article/10.3389/fpsyt.2016.00107 (accessed 26 Mar2022).
Amoruso L, Narzisi A, Pinzino M, Finisguerra A, Billeci L, Calderoni S et al. Contextual priors do not modulate action prediction in children with autism. Proceedings of the Royal Society B: Biological Sciences 2019; 286: 20191319.
Gold BP, Mas-Herrero E, Zeighami Y, Benovoy M, Dagher A, Zatorre RJ. Musical reward prediction errors engage the nucleus accumbens and motivate learning. Proceedings of the National Academy of Sciences 2019; 116: 3310–3315.
Vuong HE, Hsiao EY. Emerging Roles for the Gut Microbiome in Autism Spectrum Disorder. Biological Psychiatry 2017; 81: 411–423.
Yap CX, Henders AK, Alvares GA, Wood DLA, Krause L, Tyson GW et al. Autism-related dietary preferences mediate autism-gut microbiome associations. Cell 2021; 184: 5916-5931.e17.
World Health Organization. International classification of diseases for mortality and morbidity statistics (11th Revision). 2018.https://icd.who.int/browse11/l-m/en.
Lord C, Rutter M, DiLavore P, Risi S. Autism Diagnostic Observation Schedule (ADOS). Western Psychological Services: Los Angeles, CA, 2001.
Lord C, Storoschuk S, Rutter M, Pickles A. Using the ADI-R to diagnose autism in preschool children with autism. Infant Mental Health Journal 1993; 14: 234–252.
Schuck RK, Tagavi DM, Baiden KMP, Dwyer P, Williams ZJ, Osuna A et al. Neurodiversity and Autism Intervention: Reconciling Perspectives Through a Naturalistic Developmental Behavioral Intervention Framework. J Autism Dev Disord 2022; 52: 4625–4645.
Fuller AM, Kaplun C, Short AE. The application of the Music Therapy Visual Schedule Approach (MT-ViSA) within a group music therapy program. Nordic Journal of Music Therapy 2022; 31: 153–175.
Bellg AJ, Borrelli B, Resnick B, Hecht J, Minicucci DS, Ory M et al. Enhancing treatment fidelity in health behavior change studies: best practices and recommendations from the NIH Behavior Change Consortium. Health Psychol 2004; 23: 443–451.
Borrelli B, Sepinwall D, Ernst D, Bellg AJ, Czajkowski S, Breger R et al. A new tool to assess treatment fidelity and evaluation of treatment fidelity across 10 years of health behavior research. Journal of Consulting and Clinical Psychology 2005; 73: 852–860.
Norbury CF, Nash M, Baird G, Bishop DVM. Using a parental checklist to identify diagnostic groups in children with communication impairment: a validation of the Children’s Communication Checklist—2. Int J Lang Comm Dis 2004; 39: 345–364.
Helland WA, Biringer E, Helland T, Heimann M. The usability of a Norwegian adaptation of the Children’s Communication Checklist Second Edition (CCC-2) in differentiating between language impaired and non-language impaired 6- to 12-year-olds. Scand J Psychol 2009; 50: 287–292.
Vanderwal T, Kelly C, Eilbott J, Mayes LC, Castellanos FX. Inscapes : A movie paradigm to improve compliance in functional magnetic resonance imaging. NeuroImage 2015; 122: 222–232.
Geretsegger M, Elefant C, Mössler KA, Gold C. Music therapy for people with autism spectrum disorder. Cochrane Db Syst Rev 2014. doi:10.1002/14651858.Cd004381.Pub3.
Power JD, Barnes KA, Snyder AZ, Schlaggar BL, Petersen SE. Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage 2012; 59: 2142–2154.
Pardoe HR, Kucharsky Hiess R, Kuzniecky R. Motion and morphometry in clinical and nonclinical populations. NeuroImage 2016; 135: 177–185.
Smith J, Wilkey E, Clarke B, Shanley L, Men V, Fair D et al. Can this data be saved? Techniques for high motion in resting state scans of first grade children. Developmental Cognitive Neuroscience 2022; 58: 101178.
Constantino JN, Gruber CP. Social Responsiveness Scale (SRS). Western Psychological Services: Los Angeles, CA, 2005.
Boutron I, Altman DG, Moher D, Schulz KF, Ravaud P, Grp CN. CONSORT Statement for Randomized Trials of Nonpharmacologic Treatments: A 2017 Update and a CONSORT Extension for Nonpharmacologic Trial Abstracts. Ann Intern Med 2017; 167: 40-+.
Dwan K, Li T, Altman DG, Elbourne DR. CONSORT 2010 statement: extension to randomised crossover trials. BMJ 2019; 366. doi:10.1136/bmj.l4378.
Cahart M-S, Amad A, Draper SB, Lowry RG, Marino L, Carey C et al. The effect of learning to drum on behavior and brain function in autistic adolescents. Proc Natl Acad Sci USA 2022; 119: e2106244119.
Kim J, Kim B-N. Improvisational music therapy for children with autism spectrum disorder assessed using brain imaging. 2019. doi:10.1186/ISRCTN18340173.
Olszewska AM, Gaca M, Herman AM, Jednoróg K, Marchewka A. How Musical Training Shapes the Adult Brain: Predispositions and Neuroplasticity. Front Neurosci 2021; 15: 630829.

Table 1. Baseline characteristics of participants in M4A compared to Sharda et al.

	Sharda et al.		M4A
Continuous variables	n	M (SD)	n	M (SD)
Demographics
Age (in years)	51	10.25 (1.87)	14	9.07 (1.94)
Annual income (in 1000€)³	50	30.76 (22.4)	7	53.64 (41.47)
Severity (high=poor)
Autism severity (ADOS total)¹	35	15.34 (5.14)	8	14.5 (6.65)
Autism severity, parent-reported (ADI-R score)			5	36 (8.15)
Social responsiveness (SRS-II T-score)⁵	51	71.17 (10.49)	12	71.5 (4.76)
Maladaptive behaviours (Vineland v-score)⁴	48	19.9 (1.67)	12	18.5 (1.98)
Functioning (high=good)
Verbal IQ²	48	91.16 (22.49)	2	64.5 (0.71)
Nonverbal IQ²	45	106.87 (18.47)	2	71.5 (20.51)
Full-scale IQ²	49	98.08 (18.76)	5	89.2 (40.14)
Adaptive behaviour (Vineland standard score)⁴			3	63 (1)
Gross motor skills (Vineland v-score)⁴	51	13.37 (2.12)	5	13.4 (3.78)
Fine motor skills (Vineland v-score)⁴	50	15.58 (2.86)	5	9 (2.92)
Social communication (CCC-2 GCC standard score)⁶	48	77.23 (13.79)	14	72.57 (17.46)
Receptive vocabulary (PPVT-4 standard score)⁷	51	90.12 (27.92)	13	88.15 (20.27)
Other
Participation (CASP)			12	69.62 (11.2)
Family quality of life⁸	51	103.21 (13.59)	12	89.92 (23.35)
Categorical variables	N	n (%)	N	n (%)
Sex: male	51	43 (84)	14	11 (79)
Parent-reported sentence level speech⁹	51	42 (82)	14	13 (93)
Diagnosis: Childhood autism (F84.0)			14	4 (29)
Asperger's syndrome (F84.5)			14	3 (21)
Other			14	7 (50)
Any known medical condition			14	9 (64)
Attention-deficit hyperactivity disorder (F90)			14	5 (36)
Epilepsy			14	2 (14)
Diagnosed hearing loss			14	0 (0)
Any medication			14	5 (36)

Note. This table compares pooled data from Table 1 of study by Sharda et al. to the M4A sample randomised in the first wave. Showing variables collected in both studies and variables collected only in M4A. Sharda et al. additionally reported baseline variables on socioeconomic status (MacArthur SES Ladder), handedness (augmented 15-item laterality index of the Edinburgh Handedness Inventory), musical ability (global accuracy score on the Montreal Battery for Evaluation of Musical Abilities, MBEMA), and number of participants meeting criteria for language impairment (1 SD or greater below the mean on the Sentence Repetition subtest of Clinical Evaluation of Language Fundamentals, CELF-4). ¹ADOS – Autism Diagnostic Observation Scale (ADOS or ADOS-2) total score. Higher scores indicate greater symptom severity. ²Intelligence quotient from any scale; standardised to M=100 (SD=15). ³Self-reported by parents. Sharda et al.’s income data were converted to CAD to EUR using exchange rate of May 2022. Not collected in Bergen; quantified estimates from categorical responses in Vienna. ⁴Vineland Adaptive Behaviour Scales, v-scale scores standardised to M=15 (SD=3). Scores for gross motor skills, fine motor skills, and maladaptive behaviours subdomains were also reported in Sharda et al.; scores for adaptive behaviour (ABC scores) in M4A only. Except for maladaptive behaviours, higher scores indicate better outcomes. ⁵SRS-II: Social Responsiveness Scale, T-scale scores standardised to M=50 (SD=10). Higher scores indicate poorer skills. ⁶CCC-2: Children’s Communication Checklist, GCC=General Communication Composite, standard score standardised to M=100 (SD=15). Higher scores indicate better communication skills. ⁷PPVT-4: Peabody Picture Vocabulary Test; standardised to M=100 (SD=15). Higher scores indicate better receptive vocabulary. ⁸Beach Center Family Quality of Life Questionnaire, possible range 25-125. Higher scores indicate better quality of life. ⁹As reported by parents.

Table 2. Frequency of activities used in each intervention

Music therapy		Play-based therapy
Activity	Frequency	Activity	Frequency
Piano	17	Lego	16
Djembe	9	Halli galli	10
Kazoo	9	Beanbags	9
Boomwhackers	8	Play doh	8
Ukulele	8	Finger puppets	7
Eggshakers	7	Bubbles	6
Microphone	7	Four in row	5
Xylophone	6	Puzzle	4
Flute	6	Jenga	4
Bells	5	Darts	1
Kalimba	2	Non-musical book	1
Musical book	1	Memory cards¹	1
Finger puppets²	1

Note. Based on analysis of 41 videos (for each of 14 participants, 1 early, 1 middle, and 1 late session was selected; one PT stopped early so that only 2 videos were analysed; total 21 MT, 20 PT; all from the first intervention). ¹This activity was not from the list; the child brought it from home. ²This activity was not from the list for this intervention.

The authors have declared there is NO conflict of interest to disclose

Design and implementation of a replication study: The Music for Autism (M4A) binational assessor-blinded randomised crossover trial

Status:

Version 1

Abstract

Figures

Introduction

Materials And Methods

Results

Discussion

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1