Feasibility outcomes
This feasibility study demonstrated that implementing PLM and PRM with vulnerable delirium patients at the AG ward was feasible and that the MT could successfully conduct music preference assessments. Obtaining music preferences from the legal representatives before engaging in direct, interactive assessments with the patients helped the MT establish a potentially familiar starting point for further assessment based on the dialogic approach and recognition of music examples. Such an approach helped create a personalized environment in which recognition memory could be activated and musical memories retrieved (54). The interactive assessments might also have impacted delirium outcomes prior to the commencement of the MIs and created a confounding variable. They could also have generated expectations regarding upcoming interventions and impacted the participants’ test performance.
Additionally, our study reaffirms prior research indicating hypoactive delirium as the predominant subtype (55, 56), with 68% of our participants exhibiting hypoactive symptoms. Since distinct delirium subtypes present varying symptoms and necessitate different care approaches, it was previously recommended to explore treatment options separately for each subtype (57). This recommendation aligns with our suggestions for further investigation into MIs.
Robust adherence, high TF, relatively high retention rate, consistent dosage delivery, and minimal protocol deviations that PLM demonstrated indicate this intervention is feasible and well accepted. No intervention-related adverse events or unusual treatment effects were recorded, and no refusals further suggest that the PLM is likely safe. Additionally, descriptive data from the MT’s session notes and checklists indicate that PLM might also be engaging, with patients singing, moving to music or reminiscing with the therapist in nearly all the sessions. As 77% of the participants in the PLM had hypoactive delirium, the last finding might be particularly relevant for further exploration of PLM in the treatment of the hypoactive delirium subtype.
PRM exhibited lower adherence, retention, and inconsistent delivery and duration (from 10 to 33 minutes). As discontinuations were mainly associated with patients’ health decline, palliative status and discharge rather than refusals, low adherence and retention might not indicate the PRM’s low acceptability. However, the data from the MTs notes showed that the patients were actively engaged in only about 50% of the sessions and that they more often expressed signs of restlessness, lack of interest in music, desire to fast-forward or switch to the next song, and requested to end the sessions earlier. Such responses suggest that PRM may be experienced as less engaging and monotonous for the patients due to either the non-interactive MT or the delivery format. This finding aligns with our previous assumption that prolonged exposure to complex musical stimuli delivered from a loudspeaker could lead to habituation and boredom in delirium patients (37, 58).
Despite the 83% success rate, PRM did not satisfy TF due to a persistent breach of one of the protocol’s compulsory items regarding prohibited patient-therapist interaction. The interaction was always patient-initiated and related to their confusion, pain, distress, or the need to converse about their experience of music. Although ethically justifiable for addressing patients’ safety and needs, interactions lowered the overall consistency of treatment delivery. Excluding the MT from the room could mitigate this issue in further research. However, it would raise concerns regarding patients’ safety, as unsupervised music exposure may lead to increased confusion and adverse events. Replacing the MT with another health practitioner could be another alternative, in which case any interaction would represent what could be expected in any other setting.
The trial demonstrated that recruitment and assessment procedures were feasible and accurately identified patients with delirium for inclusion. While ensuring a high level of expertise on the patients and the ward, the involvement of the internal assessors, with their high workload and other commitments, resulted in the participants being recruited only when an assessor was available – two workdays during day shifts, to ensure the completion of the assessments/interventions before the weekend. Coupled with strict inclusion criteria, the internal assessors’ limited availability resulted in a low recruitment rate. The engagement of external assessors available in most shifts seven days a week is advised for future research.
The recommended test battery for the pre-post assessments was efficient, accurate in assessing delirium and its features, and suitable for application at the AG ward. However, the completion time varied among the assessors. There is currently no definitive diagnostic test for delirium, so its detection depends on assessing key features, combining observation, cognitive testing, patients’ medical history and clinical interviews (59). Aside from giving a more specific insight into the trajectory of delirium severity by combining continuous (symptom-related) and dichotomous (delirium yes-no) variables, using harmonized test batteries such as ours contributes to developing more robust, reliable and standardized assessments for delirium and its severity in the future (59). The test battery was also well-accepted by delirium patients, with very few refusals, usually related to the severe worsening of their condition. Deviations from the assessment manuals were few and related to either the assessors missing a one-hour time window for assessments or unintentionally omitting some of the tests. However, the lack of post-intervention effects could indicate that time-window for assessments might have been too long and that potential post-intervention effects could have been better captured closer to the end of the interventions. Engaging more flexible external assessors could help address this issue in the future.
Despite the test battery’s high accuracy, efficiency and suitability, the total adherence to the three-day, multiple measures, follow-up protocol was low. However, the assessments were mostly missing due to the unavailable assessors and patients becoming palliative or discharged from the ward. Thus, low adherence might not be the right indicator of the feasibility of the follow-up protocol. Individual trajectories of delirium symptoms showed that some participants had a substantially worse post-intervention performance on some of the cognitive/attention tasks. While the MIs might have caused this worsening, it may also be correlated with the multiple measurements; while providing a large amount of data, the comprehensive three-day, follow-up protocol might have presented a burden for this vulnerable patient group, causing exhaustion, tiredness or boredom, thus negatively impacting their test performance. Therefore, the suitability of the repeated measures design and the length of the follow-up period should be carefully considered in future research.
Clinical outcomes
Our results showed that the participants’ performance on the attention tests improved significantly on day three, when comparing baseline and pre-interventions scores, while most of their other symptoms were similar to baseline. However, without a control group, the observed changes are difficult to attribute to the MIs delivered the previous day, as delirium usually is usually reversed by treatment of underlying causes (7, 60). Nevertheless, the summary evaluation of individual DSM-5 criteria showed that most participants still had delirium at the end of the intervention period.
No statistically significant pre-post intervention changes or inter-group changes in delirium symptoms were observed for any of the measures. However, the trial was underpowered to detect preliminary effectiveness properly. Accordingly, the CIs for mean differences were wide for most measures, and we cannot exclude the possibility of changes in delirium symptoms associated with the interventions. The percentage of participants who could recall at least one word on the delayed recall tests was very low. With small samples in addition, testing pre-post intervention and between the groups change in proportion would have provided no conclusive findings and was omitted. No significant differences in LOS and intake of PRN medication between the groups were expected, as the study was underpowered to provide conclusive findings in this regard, and changes in these measures could be correlated with many other factors. The follow-up of delirium duration after the intervention period was unsuccessful due to the transient and fluctuating delirium nature, making it difficult to ascertain whether it had been recovered.
Despite not showing sensitivity to the MIs, clinical outcomes tested in this trial are still highly relevant for detecting changes in delirium progression and severity and should be included in the future. However, to capture the potential effects of the MIs, other complementing outcomes, such as biomarkers, patient-centred outcomes (emotional responses and engagement), or environmental outcomes related to the medical ward and staff, should be considered and explored. Data from the MT’s session notes and checklists indicated that relevant intervention-related changes might also have occurred during the MI sessions, and it is, therefore, recommended that future trials consider assessing these changes more systematically.
Testing clinical outcomes did not provide sufficient information to discern which of the two MIs could more effectively impact delirium symptoms. The wide CIs are mainly associated with small samples but indicate that the potential effect of PLM and PRM interventions cannot be ruled out.
Strengths and limitations
This trial was sufficiently powered to evaluate feasibility outcomes. However, it was underpowered to investigate the preliminary efficacy and between the groups differences, for which a sample of a minimum of 30 patients per group is recommended (61). The control group was omitted, and comparing two active arms was prioritized due to our primary aim of evaluating the feasibility and uncertainties regarding delirium diagnosis and recruitment. As the design of the previous music and delirium trials has shown to be of low to moderate methodological quality (62), with feasibility appraisals mostly missing, the main strength of our trial is its focus on feasibility and providing valuable insights improving future trials’ design.
Other strengths of this study are: 1) the use of previously validated and recommended delirium assessment procedures (2, 41, 43, 44, 59), 2) training assessors, and 3) involving an experienced delirium researcher to interpret the assessed features. Subtyping delirium is also a strength, and it has been previously recommended for its clinical and prognostic significance in treatment studies (63–65). However, due to small samples, we were not able to conduct separate subgroup analyses with hypoactive and hyperactive delirium patients. We recommend that future studies address delirium subtypes separately, as they may need to be managed differently. Using the staff employed at the site during their usual working hours was a limitation; it resulted in slow recruitment and missed assessments as they were dealing with competing priorities.
Using detailed, standardized intervention protocols with theoretical rationale for comparison, and evaluating TF is also a strength, as it increases the generalizability of effect findings (33). Other strengths are engaging a trained MT and conducting music preference assessments to personalize the interventions and increase relevance and safety, which aligns with previous recommendations (33).
In conclusion, the feasibility of recruitment procedures, music preference assessments, MIs and assessment protocols were indicated, and the results showed that PLM intervention is more engaging, better accepted, and potentially more suitable for further testing with acutely ill older patients with delirium. Recommended next steps are to undertake a pilot study with a comparative group, assess preliminary efficacy, estimate the size of the treatment effects, and to further explore different intervention dosages and frequency of delivery.