Study design, participants, and settings
M4A is an assessor-blinded crossover RCT (Fig. 1) comparing MT to a structurally matched PT intervention, combining behavioural and neuroscientific outcome measures (Fig. 2). Eligible children are aged 6-12 years and have an ASD diagnosis from a licenced health professional with an ICD code28, ideally supported with scores of standardised tools (ADOS29, ADI-R30) and assessment of intelligence quotient (IQ) or level of ability. Children with individual MT during the last 6 months; generalised epileptic seizures during the last 12 months; more than 12 months of cumulative music training; or conditions precluding fMRI scanning (such as metallic or electronic implants, claustrophobia, or persistent problems in complying with scanning procedures) are excluded. Participants are recruited in two cities (Bergen, Norway; Vienna, Austria) by advertising through social media and reaching out to relevant groups such as parent associations, schools, clinicians, and special educators. Baseline assessment includes demographic data, behavioural variables, MRI scans, and other biological data (Fig. 2). Participants are randomised to a sequence of interventions (MT-PT or PT-MT). The first intervention takes place over 3 months, followed by a washout period of 3 months, and then by 3 months of the second intervention (Fig. 1). When possible, enrolment is scheduled so that interventions will take place during the school year and washout periods during school holidays. We expected around 27 participants randomised each half year in both sites combined. Outcome assessments are conducted before and after each intervention period, leading to a total of four time points (Fig. 1, 2). Changes to the protocol since the initial version are listed in Supplementary Table S1.
Interventions
Locations (a dedicated room at each site) and materials (matched to those in Sharda et al. as closely as possible) were standardised to ensure consistent implementation in Bergen and Vienna. Both interventions consist of 12 weekly one-on-one sessions, 45 minutes each, conducted in the same setting by a licensed music therapist. Although we initially planned to conduct MT and PT in accordance with manuals from Sharda et al., it became necessary to further specify and develop them to provide sufficient detail for replication across therapists and sites. The treatment manual and the fidelity assessments will be further described in separate publications. Briefly, both MT and PT use a developmentally oriented approach with the overall aim of improving quality of life31 and targeting creating a shared experience; building meaningful relationships; and fostering self-expression. A varied set of activities, combining therapist- and child-led interactions, target common goals including sensory integration, social communication, and emotional development. Children can choose 4 activities per session using a visual schedule to facilitate communication and provide structure to the sessions32. MT uses rhythmic cues, musical instruments (e.g. piano, djembe, xylophone, ukulele), songs, and stories accompanied by songs. PT is designed as a play-based active comparison condition to control for factors such as support, therapist attention, positive expectancies, and emotional engagement. It uses verbal interaction, toys (e.g. Lego, finger puppets, Play-Doh, puzzles), and stories (when possible, the same as in MT), without a musical component. Treatment fidelity33,34 is monitored and ensured via therapists’ reports, supervision meetings, and video recordings of sessions. For every child, three videos from each intervention arm (first, middle, and last session) are assessed by two independent raters. The fidelity checklist covers dimensions such as dose (number of sessions); structure (number and type of activities); content (use of musical or verbal reinforcements, use of feeling cards); and processes (therapist’s attunement, therapist’s responsiveness). In addition, child engagement and interaction levels are investigated. Differentiation between MT and PT is analysed by comparing the amount of musical and verbal reinforcement used.
Outcomes
Primary clinical outcome
Social communication after 3 months (end of each therapy) as assessed by a blinded assessor using the General Communication Composite (GCC) standard score of the Children’s Communication Checklist (CCC-2).35,36 The CCC-2 measures aspects of pragmatic communication with 70 items across 10 subdomains (speech, syntax, semantic, coherence, inappropriate initiation, stereotyped language, use of context, nonverbal communication, social relations, and interests) and has shown high interrater reliability and internal consistency.35 The GCC standard score is standardised to M=100 (SD=15), with higher scores indicating better social-communication skills (reversed from raw scores). Whereas Sharda et al. relied on parent reports for this outcome, where attempted blinding was unsuccessful11, M4A relies on a special educator/teacher who knows the child well. Success of blinding is verified at the last follow-up by asking the assessor about incidental unblinding.
Secondary clinical outcomes, measured at the same time points as the primary clinical outcome, include 2.) participation (Child and Adolescent Scale of Participation, CASP); 3.) family quality of life (Beach Center Family Quality of Life Scale); 4.) receptive vocabulary (Peabody Picture Vocabulary Test, PPVT; conducted by a health professional); 5.) symptom severity (Social Responsiveness Scale, SRS); 6.) adaptive behaviour (Vineland Adaptive Behavior Scales); 7.) chronic stress (hair cortisol); 8.) adverse events; and 9.) preference. Participation and quality of life are the two main secondary clinical outcomes because of their particular importance to participants. All secondary clinical outcomes except receptive vocabulary and chronic stress are parent-rated; for details, see Supplementary methods.
Brain connectivity of frontotemporal regions will be derived from resting-state fMRI, a measure of the functional connectivity between brain areas under a condition where no concrete task is performed. Six seed areas will be used as the main neuroscientific outcome. Seeds are anatomically defined regions of interest (ROIs) in Montreal Neurological Institute space for the left and right HG, IFG, and TP. These seeds anchor frontotemporal networks involved in language and communication and may be altered in ASD and modified by MT.11 The timeseries for each of the seeds will be used to generate individual participant-level maps using whole-brain general linear models at baseline and after the interventions. First-level maps will then be entered into the second-level analyses. Results will be reported with a 5% significance level adjusted for multiplicity by familywise error rate. Z-scores of parameter estimates will be used to measure connectivity strength. Structural brain changes (GMV and WMV) will be assessed using voxel-based morphometry (VBM), derived from the anatomical T1 image which is acquired at the beginning of each scanning procedure. ROIs include the 6 seeds above as well as other areas identified in our previous review (cerebellum, superior temporal sulcus, temporo-parietal area).7 Scanning parameters were harmonised across sites (see Supplementary methods). The brain scan procedure requires children to be as still as possible for approximately 16 minutes inside the MRI scanner. To facilitate the procedure, children are shown movies during the scan – an entertaining movie (Tom and Jerry or a preferred movie) during the T1 scan, preparatory scans, and breaks between scans, and a specifically designed video37 during resting state. To help families prepare, we provide parents with a movie showing the surroundings, the procedure, and the preparation for the scan; a sound file with the recorded scanner noise; and an individualised visual schedule for the visit (Supplementary Figure S1). On the day of the scan, researchers maintain a calm and patient attitude to increase the likelihood of cooperation and inform radiographers about any specific details concerning the child. After a successful scan, children receive a toy set of an MRI scanner (playmobil) as a reward. When needed, we schedule an additional in-person meeting at the scanning facility prior to the baseline scan to familiarise them with the environment and the researchers.
Additional exploratory outcomes added after the initial protocol include microbiome data and performance in behavioural prediction tasks (see Supplementary methods).
Sample size and power
Sharda et al. reported a mean difference of 4.84 and an effect size of d=0.34 (SD not reported but calculated as 4.84/0.34≈14.2) on the CCC-2. Our previous Cochrane review showed similar effect sizes38. We therefore designed M4A to be powered for an effect size of d=0.34. We further assumed outcomes for MT and PT to be correlated with r=0.50. With a two-sided significance level of 5%, a sample of n=70 will be required for 80% power; to accommodate for attrition, we plan to recruit 80 participants. Similar power can be achieved for a range of reasonable effect sizes and correlations (Fig. 3). The same power can be reached with varying combinations of effect sizes, correlations, and numbers of participants with complete data; in contrast, a parallel design would require almost 300 participants (Supplementary Table S2). In summary, the sample size calculation for M4A appears realistic under a range of reasonable scenarios.
Randomisation
Computer-generated randomisation lists, with randomly varying block sizes of 4-6, separate for each country, are used to ensure both unpredictability and balance. The lists were generated by a researcher who does not have contact with participants and are stored centrally, concealed from clinical investigators, until a decision on inclusion is made. Once a participant is enrolled, the randomisation result is conveyed to clinical investigators through an online system (REDCap on NORCE servers).
Statistical analysis
All outcomes will be analysed on an intention-to-treat (ITT) basis; statistical models are described in Supplementary methods. Analysis for the present paper was pooled across intervention groups to maintain blinding of researchers. Baseline variables reported by Sharda et al. were pooled across groups using weighted means and pooled standard deviations for comparison with the present sample. Analyses were primarily descriptive and consisted of graphical analyses and summary statistics (absolute and relative frequencies, means, SDs). We examined the quality of fMRI scans using framewise displacement (FD)39, an indicator of head motion, which is a common problem especially in children and clinical populations40. Volumes exceeding a certain FD threshold (0.5mm39 or other values41) can be downweighted or scrubbed to obtain usable data39; longer passages of high movement can be removed. Analyses were conducted in R and Matlab.
Ethics
The study was approved by the Norwegian Regional Committees for Medical and Health Research Ethics (REK Sør-Øst D, 07 May 2021, reference 246145) and the Ethics Committee of the University of Vienna (21 April 2021, reference 00634) before the start of enrolment. Upon recruitment into the study, parents/caregivers of eligible participants provide written informed consent; children assent orally based on oral information and an invitation letter describing the study in simple terms. Terminating participation in the study is possible at any time during the process. All collected data are stored on a password-restricted server. Survey data are either entered directly by parents or teachers or independently entered twice by researchers if collected on paper.