Study design
This is a proof-of-concept study single arm study.
Participants
Patients referred by GPs to the ENT Department, Queen’s Medical Centre Campus (QMC), Nottingham University Hospitals with hoarseness with will screened and those with a clinical diagnosis of MTD will be reviewed in a joint voice clinic in the ENT Department, for more specialist assessment. A diagnosis of Primary MTD will be made by consensus between an experienced Laryngologist and SLT-VT following detailed medical evaluation including laryngostroboscopic examination. Consecutive patients with a confirmed diagnosis of MTD due to 'voice abuse/misuse' (Table 1) who meet the inclusion (Table 2) and exclusion criteria (Table 3) will be invited to take part in the study and 10 will be recruited (See Figure 1 for Flow Chart).
[Insert Figure 1 here]
Table 1: Symptoms and clinical findings compatible with a clinical diagnosis of MTD
1) The history of the presentation of the condition and its compatible with a Primary MTD diagnosis;
2) Presenting vocal symptoms such as hoarseness, change in voice quality, limitations in pitch, loudness, flexibility, and/or stamina of the voice;
3) Absence of organic pathology such as structural abnormalities, neurological and inflammatory conditions on endoscopy;
4) Auditory-perceptual voice change judgment which include one or more of the following characteristics: variable or abnormal in quality (strained, pressed, creaky, rough, breathy and asthenic voice), have an abnormal habitual pitch with or without a restricted fundamental speaking frequency range or have abnormal loudness and loudness variability during speech;
5) The presence of muscle soreness, tenderness or other evidence of hyperfunction in the thyrohyoid or cricothyroid space and/or suprahyoid muscles on physical examination,
6) Laryngoscopic findings of laryngeal hyperfunction pattern (MTP Type 1-III) (Figure 2).
|
Table 2: Inclusion criteria
Males & females
18 or above
Clinical diagnosis of primary MTD based on history and laryngoscopic assessment (Type (I-III) MTD pattern) through joint assessment by a SLT-V and laryngologist
Current voice problems, persistent for greater than 2 months
Severity of disorder a) VHI ≥ 30 and b) patient wants therapy
Patient willingness to undergo treatment
Agree to undertake the study protocol
|
Table 3: Exclusion criteria
Organic vocal pathology 1) Structural/neoplastic disorders (e.g. carcinoma, cyst, polyp, papilloma, Reinke’s oedema); 2) neurological disorders (e.g. vocal cord palsy, paresis, spasmodic dysphonia); 3) inflammation (e.g. infection, reflux (RFS >7) or significant relevant systemic disease (e.g. severe COPD) or need for surgery
Significant psychological issues identified during initial assessment (with option to withdraw if discovered during the treatment periods and agreed by both patient and Therapist)
MTD pattern (IV-VI) compatible with significant primary psychological aetiology on laryngoscopy
Transgender voice issues
Previously incompletely treated dysphonia, neurological disease, or upper aerodigestive tract malignancy
Had previous VT or CVT training or pharmacological treatment for their voice problem (other than proton pump inhibitors or an alginate recommended for disorders of laryngopharyngeal reflux–related symptoms)
A hearing impairment that would prohibit or impact on telepractice treatment
Significant concomitant health problems affecting voice
Not have or be able to use a computer with video link at home or in hospital even with support
Not able to commit to the study protocol
|
Sample size
The sample size of ten was based on the similar studies found in extant literature (Table 4). Based on these data, the mean aggregate pre-treatment VHI score was 50.23 (Range: 65.62±13.61 - 38± 21.03) with a mean SD of 19.3 and post-treatment mean of 28.54 and mean SD of 16.0. This gives an average improvement of 21.67. If non-inferiority is assumed and the minimal detectable effect is set at 15, Type I error rate α = 0.05 and Power (1-β) 0.80, then the sample size is calculated at nine (75). Allowance has not been made for dropouts. This information will be useful in the analysis of overall assessment of proof-of concept.
Table 4: Summary of Pre- and Post-treatment studies using VHI as an outcome measure
Author
|
Reference
|
Type of VT
|
Treated
(n)
|
Pre-treatment VHI score
Mean
(SD)
|
Post-treatment VHI score Mean
(SD)
|
Raw difference in means
|
Percentage change in score
(%)*
|
100%-Improvement % change in VHI score
|
Effect
Cohen’s d
|
Aghadoost et al, 2020
|
(76)
|
Circum-laryngeal Therapy
|
8
|
70.25
(13.61)
|
27.81
(6.62)
|
42.44
|
39.6
|
60.4
|
3.966
|
Aghadoost et al, 2020
|
(76)
|
Voice Facilitating techniques
|
8
|
65.62
(10.08)
|
27.50
(9.22)
|
38.12
|
41.9
|
58.1
|
3.946
|
Watts et al, 2015
|
(77)
|
VHA and Stretch & Flow VT
|
10
|
62.3
(24.4)
|
30.3
(15.5)
|
32
|
48.6
|
51.4
|
1.566
|
Rosen et al, 2000*
|
(78)
|
VT
|
10
|
36
|
18
|
18
|
50
|
50
|
|
Wenke et al, 2021
|
(42)
|
Standard VT
|
8
|
48.3
(25.65)
|
24.38
(15.7)
|
23.82
|
50.5
|
49.5
|
1.125
|
Wenke et al, 2014
|
(79)
|
Standard VT
|
7
|
38.00
(21.03)
|
25.42
(17.18)
|
12.58
|
66.9
|
33.1
|
0.655
|
Wenke et al, 2021
|
(42)
|
Intensive VT
|
8
|
50.50
(17.10)
|
35.38
(16.27)
|
15.12
|
70.1
|
29.9
|
0.906
|
Wenke et al, 2014
|
(79)
|
Intensive VT
|
7
|
49.14
(15.56)
|
35.42
(20.9)
|
13.72
|
72.1
|
27.9
|
0.745
|
|
|
|
Mean
|
50.23
|
28.54
|
21.67
|
57.66
|
42.34
|
0.95
|
|
|
|
Mean SD
|
19.3
|
16.0
|
|
|
|
|
Watts et al, 2015**
|
(77)
|
VHA
|
10
|
65.8
(25.9)
|
59.1
(26.6)
|
6.7
|
89.8
|
10.2
|
0.255
|
Studies have been ranked for treatment effect and when there were two arms to the study, they have been listed separately.
*No SD values given
** This group of patients acted as a control and were given VHA at treatment
Peer review and Patient & Public Involvement
The project was discussed with the ENT SLT team at NUH and externally with three experienced Voice Therapists each with over twenty years experience. The CVT treatment protocols and assessment methods were discussed with ten other experienced CVT teachers at the Complete Vocal Institute in Copenhagen. Initial peer review was obtained by the Innovation Fund Denmark when successfully obtaining funding for the the whole project.
The project was also discussed at an NUH Patient & Public Involvement (PPIE) ‘drop-in’ session in December 2021. The group consisted of local experienced research reviewers. Positive feedback was that the project was interesting and that the aims were clear. There was a question about governance, safeguarding and supervision of the CVT-P. The panel were reassured that all non-clinical members of the study team and the CVT-P would have honorary Trust contracts and be subject to Trust employment terms and conditions, undergo GCP training, and that therapy would be overseen jointly by the study SLT-V (AW) and CVT supervisor (CS). The study SLT-VT will observe some of the sessions and conduct a progress review after every two therapy sessions. All patients could withdraw from the study at any time and the patients would all be reviewed at eight weeks to determine if their goals of therapy had been met and whether further assessment of treatment was required. In addition, the protocol with a questionnaire was given to twelve patients attending the Joint Voice clinic at NUH. Two patients had MTD and the rest had other voice problems. Positive feedback was obtained with the only comments being about reassurance about supervision (Table 5).
Table 5: Summary of Patient survey results to questionnaire regarding proposed study by patients attending Joint Voice clinic at NUH
Question
|
Yes
|
No
|
Comments
|
Having read through the information sheet, would you theoretically consider taking part in this study?
|
10
|
2
|
• No, because of the distance I have to travel
• No, because I have spasmodic dysphonia
|
Do you think the aim of this pilot study, to test whether CVT-VT used in performers can help patients with MTD, is a good idea?
|
12
|
0
|
|
Do you have any concerns about patients having therapy with a specialist vocal coach (CVT-VT) rather than a Speech & Language therapist (SLT-VT) in this study?
|
1
|
11
|
• Yes, Are CVT-Ps as good as SLTs?
|
Do you have any concerns or comments about receiving your therapy using a video link?
|
2
|
10
|
• A small concern would be finding a quiet enough space to do it
• I have found video links very stressful when my (spasmodic) dysphonia is bad
|
If you were a participant, would you be happy, have no strong opinion or be unhappy if the therapy sessions were recorded for more detailed analysis
|
0
|
11
|
(One patient did not answer)
|
If you were a participant, would you be happy, have no strong opinion or be unhappy if a specialist Speech and Language Voice Therapist observed the CVT Therapy sessions
|
0
|
11
|
(One patient did not answer)
|
Ethics and Consent
Favourable Ethical opinion for conduct has been granted by the East of England - Cambridge South Research Ethics Committee (Reference no. 22/EE/0047). Patients will be screened for major psychological issues and excluded if significant. Some therapy sessions will be observed by the study SLT-V and the logs of therapy reviewed for each patient after every two sessions. Patients deemed eligible for the study will be invited to attend a face-to-face Research Clinic appointment (t=0) for consenting as well as pre-therapy assessment. Following completion of the intervention, participants will be asked to attend a further Research Clinic appointment for post-therapy assessment (t=8), feedback on their experience, and whether goals for treatment were met. Should further therapy be required, this will be arranged in the form of standard SLT-VT.
Interventions
Voice Clinic
Patients identified as having primary MTD seen in the ENT department at NUH will be assessed by a Laryngologist and SLT-VT, their eligibility checked and given a Patient Information Sheet outlining the study. They will be contacted by phone after their clinic appointment to see if they are willing to consent to taking part. Those interested will be invited to attend a Research clinic run by the study Laryngologist and SLT-VT. Those who decline will be referred on the standard SLT voice therapy pathway.
Research clinic
Patient eligibility will be checked, and written consent obtained. Study questionnaires including a goal-setting checklist will be completed. Acoustic and EGG recordings will be made including MPT and the CRF completed. A post-visit check of the participant’s video link will be made.
Therapy sessions
Pre-therapy
Prior to commencing therapy, all participants will be given Indirect voice therapy using a standard vocal hygiene advice leaflet [Figure 6] and an individual discussion of relevant patient issues by the SLT-V when they attend the Research clinic.
Therapy programme
The therapy provided will be reported in accordance with the TIDieR guidelines (80). Patient contact details will be passed on to the CVT-P who will make contact with the patient by email or mobile to arrange video therapy sessions. These will be conducted using the NUH Trust approved platform DrDoctor. Prior to the first therapy, the CVT-P will review all the patient information (summary case history and clinical findings; voice and EGG recordings; and laryngostroboscopic video recordings). Up to 6 forty-five-minute therapy sessions will be arranged between the CVT-P and patient within an eight-week period. An additional 15 minutes will be allowed for resolving connection, technical, patient and therapist attendance issues. The outline of the therapy sessions is listed in Table 5.
Table 6: Proposed outline of therapy session plans
Therapy session
|
Session plan
|
Therapy session 1
|
- Go through patient goal setting questionnaire discussed with patient
- Complete goal setting questionnaire part of CVT-P goal setting & CVT experience feedback questionnaire
- Complete pre-treatment CVT Speech assessment form
- Formulate management plan
- Conduct therapy
- Complete detailed Log of Therapist treatment form at end of session
- Ensure all documents are saved on the Network shared drive
- Give ‘homework’ to patient
- Arrange next appointment with patient and book using DrDoctor
|
Therapy session 2-5
|
- Review case details and documentation as necessary
- Formulate management plan
- Conduct therapy
- Complete detailed Log of Therapist treatment form at end of session
- Ensure all documents are saved on the Network shared drive
- Give ‘homework’ to patient
- Arrange next appointment with patient and book using DrDoctor
|
Therapy session 6
|
- Review case details and documentation as necessary
- Formulate management plan
- Conduct therapy
- Complete detailed Log of Therapist treatment form at end of session
- Ensure all documents are saved on the Network shared drive
- Inform JM of completion of Therapy session and so that Final research clinic appointment can be arranged
- Complete original CVT-P goal setting & CVT experience feedback questionnaire
- Complete end of therapy CVT Speech assessment form
|
CVT-Voice Therapy Intervention
The CVT intervention involves up to 6 sessions of 60-minute therapy provided by an Authorised CVT-P. The therapy consists of exercises to learn sufficient control over the support system, exercises for obtaining prototypical phonation types based on the volume and quality requirements from the patient, along with additional exercises for dynamic control over loudness, colouring of voice, and for expressivity. The therapy uses a systematic building of skills starting with voiceless support exercises, progressing to the ability to connect support to sustained vowels, towards speech and voicing tasks including pitch change and consonants, building into phrases and sentences. Dependent on the patient’s symptoms, goals and desired sound character(s), specific exercises for appropriate and healthy use of the CVT specific vocal modes that correspond to the desired loudness and quality will be chosen. For low volume or breathy voice symptoms, a more twanged and more intentionally narrowed epilaryngeal space is sought. For hyperfunction symptoms, adjustments in either breath control or releasing of unintentional constriction is sought to counter the hyperfunction and related pressed voice. Moreover, stamina and longevity are addressed by exercises focusing on sustained healthy voice use. CVT Techniques and tools addressing the specific needs and impairments of each individual patient will be documented in qualitative notes, allowing for subsequent analysis of the employed intervention strategies. Moreover, patients will be asked to maintain the exercises between therapy sessions, with the CVT-P inquiring about progress between sessions at initiation of every therapy session.
Objectives
The primary objective of this proof-of-concept study is to determine whether a CVT-VT approach improves vocal symptoms and function in patients with Type I-III Muscle Tension Dysphonia using a validated patient self-assessment questionnaire, the VHI (81).
Secondary objectives include:
- to assess whether a CVT approach improves the voice, voice function and meets the goals of the therapy treatment from the patient perspective,
- to assess whether a CVT approach meets the goals of the therapy treatment from and SLT-V and CVT-P perspective
- to document and qualitatively evaluate the elements and/or techniques used in the treatment sessions and assess if or how they differ from mainstream SLT-VT techniques and give ‘added value’,
- to assess a variety of secondary outcome measures for the treatment of MTD based on perceptual, acoustic and electroglottographic (EGG) assessments and self-reported measures to establish the best measures that reflect symptomatic, physiologically relevant, and patient-centred measures reflecting improvement in symptoms and daily function for patients
Baseline and outcome measures
Measuring good clinical practice should use instruments that are reliable, valid, and responsive to intervention (82) and the measures chosen reflect expected multidimensional potential improvements in symptomatology and vocal function. In addition when assessing new interventions it is important to evaluate the service from the patient’s perspective, the clinician’s perspective and the healthcare providers’ perspective (83). Only the patient’s and clinician’s perspective will be addressed in this proof-of-concept study.
PATIENT’S PERSPECTIVE
Two validated questionnaires the Voice Handicap Index (VHI)(81) and the Vocal tract Discomfort scale (VTDS)(6) will be used which cover a range of psychosocial, physical voice impairment symptoms, the ability to be heard and throat symptoms(84). In addition, a questionnaire will be used to assess overall satisfaction with the therapy experience.
Voice Handicap Index (VHI)
Patient-reported outcomes are generally accepted as the most relevant tool for evaluating treatment effectiveness of voice disorders as they may provide a more a more meaningful impact overall of a voice disorder (84, 85). Of these, the VHI (81) has been shown to have one of the best psychometric properties among voice-specific quality of life instruments (86, 87) and will be the primary outcome measure for this study. The VHI has been validated with strong internal consistency and test–retest reliability and has been used as a functional outcome measure from behavioural voice treatment in clinical practice and in clinical research which allows cross-study comparisons of treatment response (77, 81, 88-90) VHI scores can range from 0 (no handicap) to 120 (maximal handicap), with scores below 30 generally associated with minimal handicap. Thirty was one of the eligibility criteria used as the cut-off value for consideration for the study. Measures of VHI have been reported in earlier investigations voice therapy for MTD (see Table4) demonstrating an average improvement of 21.67 and average effect size of 0.95 (Cohens d).
Vocal tract Discomfort scale (VTDS)
Although voice problems are the most common presenting complaint in patients with MTD, many have associated symptoms of vocal tract discomfort(6). It is thought these relate to increased vocal effort and vocal fatigue(6, 91, 92). The VTDS is a self-administrated questionnaire designed to measure the subjective perception of sensory discomfort in the throat (vocal tract) (6). Patients are asked to rate the frequency of occurrence and severity manifestation of eight subjectively different sensations: burning, tightness, dryness, aching, tickling, soreness, irritability, and lump in the throat. The frequency and severity are rated separately on a seven-point Likert scale ranging from 0 to 6 for frequency (0 = never, 2 = sometimes, 4 = often, 6 = always) and for severity (0 = none, 2 = mild, 4 = moderate, 6 = extreme) (93). VTDS scores have been shown to correlate with the Total and Physical domain score of the VHI (94) and decrease after voice training and vocal hygiene education in teachers (95). A change in the Persian version of the VTDS of 6.0 points for each subscale following a therapeutic intervention has been interpreted as a real change with a 95% confidence level (96).
A questionnaire to evaluate the acceptability of the approach to patients
A non-validated questionnaire to evaluate the perceived effectiveness of the therapy, whether goals of treatment were met and their satisfaction with the therapy and the use of telehealth be administered following up to six sessions of therapy at the end of the treatment period (Figure 7).
CLINICIAN’S PERSPECTIVE
Clinical assessment of vocal function will be multidimensional and include: 1) laryngostroboscopic video examination, 2) Auditory-perceptual evaluation, 3) Aerodynamic assessment, 4) Acoustic and Electroglottographic analysis of sustained vowels, short phrases, and connected speech.
1. Laryngostroboscopic video examination
All patients will have a nasoendoscopic laryngostroboscopic examination prior to recruitment to rule out organic pathology and confirm the pattern of Muscle Tension Dysphonia.
2. Auditory-perceptual evaluation
Auditory-Perceptual evaluation of voice by expert trained listeners is a subjective judgment on the type and severity of the dysphonic quality present (97). No auditory-perceptual rating is perfect (98, 99), but CAPE-V is widely used in both the clinical and research settings as it provides a finer judgment of voice quality than the four-point ordinal scale used in GRBAS (100). CVT have developed their own speech assessment rating using CVT-specific terminology and will also be used.
a. CAPE-V
The Consensus Perceptual Auditory Evaluation of Voice (CAPE-V) is a standardized perceptual measurement of voice that provides an overall rating of severity as well as discreet ratings of specific vocal parameters including roughness, breathiness, strain, asthenia (vocal weakness), pitch, and loudness (101). To overcome the reduced intra-rater and inter-rater agreement associated with the increased freedom of judgement, training and external anchors will be provided.
All voice samples will be rated by four experienced SLT-Vs who will undergo a brief refresher training programme in the use of CAPE-V to improve inter-rater reliability (102, 103). with at least 10 years’ experience with voice evaluation and treatment who will be blinded to the whether the recording is pre- or post-therapy. Twenty percent of samples will be re-rated. Recordings will consist of sustained vowels and the CAPE-V sentences, which are the same stimuli used for acoustic/EGG analysis. Samples will be randomly ordered and coded. The severity of each judgment will be quantified by an "X" mark through a 100-mm horizontal line, where the far left end of the line represented normal (and thus assigned a rating of 0) and the far right end of the line represented most abnormal (and thus assigned a rating of 100) (101, 104)((105). Listeners will rate the perceptual dimensions of (a) overall severity, (b) roughness, (c) breathiness, and (d) strain in a similar manner as that of Kapsner-Smith et al. (2015)(106). The mean rating of the four judges for each recording will be used as the data point for individual patients.
b. CVT Speech Therapy assessment rating (Figure 8)
The CVT Speech assessment rates three overall parameters: (1) Descriptive technical parameters, (2) additional speech parameters, and (3) parameters for detecting issues. All parameters are rated on a 3-point scoring. The scale ratings are 0=not at all, 1=mild, 2= moderate, and 3=a lot/much. Any speech assessment involves at least (1) the technical parameters and (3) the detection of issues. The (2) additional speech parameters are assessed if deemed necessary by the CVT-P. The three parameters are and include:
(1) Descriptive technical parameters include descriptions of mode of vibration (mainly), the amount of metallic character in the voice, the degree of density in the voice, the chosen vocal mode and vocal mode variation, whether the speaker is within the centre of the chosen vocal mode, to what degree there are vocal effects present, to what degree there are rough vocal effects present, degrees of voice instabilities, and degree of strain.
(2) Additional Speech Parameters include rating of sound colour, amount of twang, the speed of speech, the pitch, pitch variation, accentuation/stressing of words, volume, and size of larynx.
(3) Parameters for detecting issues include rating the degree of support energy / effort, the degree of economising breath, assessment of inhalation, the opening of the mouth, and a final conclusion describing the assessed main issue to be addressed.
Aerodynamic measures: MPT
MPT is a simple and inexpensive aerodynamic voice parameter for measuring glottal competence and is expressed in seconds(100). The patient is asked to inhale deeply and then sustain a steady /α/ ‘ah’ vowel for as long as possible. The longest duration of the three consecutive attempts will be selected as the MPT measure for analysis. MPT will be measured from the time axis on the Speech studio recording. The change from pre- to post-therapy value will be recorded and analysed with each subject acting as their own control to account for individual variation and the recognition of the significant difference between MPT values in men and women (107). MPT can be used with caution as a measure of laryngeal dysfunction when inadequate glottal airway resistance is suspected and provides an indicator of the degree of ‘physiological support’ for speech (107). However it does not distinguish between inefficient glottal valving from reduced poor respiratory reserve and poor driving pressure of vocal fold vibration (107). Values under 10 seconds are regarded as pathological. In a study of 8 patients with MTD using Stretch-and-Flow voice therapy by Watts et al(90) there was a mean improvement in MPT from 12.36±3.61 to 15.49±4.33 (p=0.14) with a significant clinical effect (medium effect size of d=0.79). It was postulated that the improvement was due to improved control and coordination of the respiratory and laryngeal mechanisms associated with reduced physiological effort. It is hypothesised that MPT values will increase post-CVT therapy.
Acoustic and EGG measures
Acoustic and EGG measures of voice provide an objective assessment of the vocal function at a specific point in time. The acoustic and EGG measures will be recorded in a treatment room, which exhibits nominal ambient room noise. The recordings will be made at the time of the clinic appointments (t=0, t=8) using a head-mounted omnidirectional microphone placed approximately 3 cm from the left side of the corner of the mouth. The synchronous EGG signal will be recorded from two electrodes placed on either side of the larynx of the subject and inputted into the Laryngograph microprocessor. The signals will be processed using Speech Studio software program (vers. 5.21. Laryngograph® Ltd.) and saved as .wav files. The recordings will then be analysed using the in-built statistical programs.
The following voice parameters based on EGG and Acoustic measures reflecting different aspects of vocal function will be assessed:
1) Sustained vowels /a/ and /i/ + /ɛ/
Acoustic and EGG signals recordings will be made of the words composed with the four corner: /fi:t/ as in feet, /fu:d/ as in food, /fæd/ as in fad, /fa:rm/ as in farm. They will be analysed using the Multidimensional Voice Profile (MDVP) analysis programme in Speech Studio (vers. 5.21. Laryngograph® Ltd.) (108). The multidimensional measures will include: Average fundamental frequency (Fx), Average vocal fold Contact Quotient (Qx) and mean Sound Pressure Level (mean SPL); a range of perturbation measures (Standard deviation of the fundamental frequency (SD of Fx), Standard deviation of the Contact Quotient (SD of Qx), Jitter Factor, Shimmer dB, cepstral peak prominence (CPP), Relative Amplitude Perturbation (RAP)); Measurements of spectral noise energy versus harmonic energy (Normalised Noise Energy (NNE), Harmonics to Noise Ratio (HNR)) (108). Spectral energy measurements have been reported to be the most correlated acoustic measure with perceptual judgments of roughness, breathiness, and hoarseness. It is hypothesised that perturbation and spectral noise energy versus harmonic energy measures will improve following therapy.
The recorded acoustic signals (.wav files) will also be input into the computer program Praat (109)(Paul Boersma & David Weenink, Institute of Phonetic Sciences, University of Amsterdam, the Netherlands; http://www. praat.org/) to calculate CPPS, the AVQI, and H1-H2 ratio from the recordings (109). Unidimensional cepstral acoustic measures such as cepstral peak prominence (CPP) (110) and smoothed CPP (CPPS) ) (111) have been used as reliable predictors for dysphonia with values lowering as dysphonic severity increases (112). It has been found to have better sensitivity, specificity, and positive and negative predictive values than jitter, shimmer, and NHR. CPP is the prominent peak with the highest amplitude representing the fundamental frequency. CPPS has also been shown to be a good measure of vocal fatigue in patients with hyperfunctional voice disorders (113). Acoustic measurement of smoothed cepstral peak prominence (CPPS will be made on the stabilised one second mid-portion of sustained /a/ vowel (CPPs-/a/) and the CAPE-V voice phrase ‘We were away a year ago’ phrase (CPPS-s).
In addition, the mean level difference in decibels (dB) between the first and second harmonics (H1-H2) of all voiced segments will be measured for the vocal tasks. H1-H2 is a low-bandwidth measure of spectral tilt that provides an estimate of vocal fold closure during phonation (114). Less abrupt/reduced vocal fold closure associated with a breathier voice quality is reflected in larger differences between the two harmonics (higher H1–H2); conversely more abrupt or increased vocal fold closure is associated with a more strained voice quality and smaller differences between the two harmonics (lower H1–H2) (115-117). H1-H2 differences could therefore provide an additional measure of change in glottal contact in response to therapy(118).
2) the sentence “We were away a year ago” (from CAPE-V)
Other comparative pre- and post-treatment spectral analyses will also be performed using Long-term Average Spectra (LTAS). LTAS is a fast Fourier transform-generated power spectrum of the frequencies comprising a speech sample. Thus, the LTAS is a composite signal representing the spectrum of the glottal source as well as the spectrum or resonant characteristics of the vocal tract. LTAS holds promise as an acoustic index of voice quality (119). For example, relatively weak harmonic energy in the higher frequencies of the speech spectrum and a corresponding increase in spectral tilt are characteristic of breathy or hypo-functional signals (110, 120). In contrast, excessive vocal fold impact and turbulent noise, both of which have been noted in functional dysphonia, are associated with relatively greater energy in the higher frequencies of the speech spectrum (121).
3) a passage of read the text ‘Arthur the Rat’
It has been argued that connected speech is a better reflection of vocal function compared to sustained vowels (108). A range of statistical measures based on EGG and acoustic measures of connected speech in Speech Studio™ describe different aspects of vocal fold vibration and function (108, 122). These include:
Mode speaking fundamental frequency (Fx) (Hz)
Coherence %
Mode Loudness level (Ax) (dB) + 80% dB range
Coherence %
Mode contact quotient (Qx) % + 80% Contact % range
Coherence %
Irregularity scores Fx, Ax, Qx (%)
Speech Range Profile (80%)
80% Minimum and maximum Intensity (dB)
80% Maximum - Minimum frequency range converted to Semitones (Semitone range)
Acoustic formant frequency measures of Vocal tract length and formant space:
There is radiographic evidence that individuals with vocal hyperfunction exhibit a significantly higher laryngeal position and reduced hyolaryngeal space with consequent shorter vocal tract lengths (VTL) than individuals with healthy voices (32, 123, 124). Raising of the larynx is a consequence of increased extrinsic laryngeal muscle activation, specifically, activation of the thyrohyoid, digastric, stylohyoid, mylohyoid, geniohyoid, hyoglossus, and/or genioglossus muscles (125). Changes in VTL cause a change to all formant frequencies, with a shorter VTL corresponding to increased formant frequencies (124, 126, 127). A simple relationship between VTL and formant frequency can be derived by modelling the vocal tract as a uniform tube that is closed at one end and open at the other, which exhibits odd quarter-wave harmonic resonances (127, 128). More reliable estimates can be made using higher formant frequencies (F3 and F4) as they are more stable (129, 130). A high larynx can lead to restricted tongue movements and vocal tract shaping which can impact on clarity of vowel formation (balance between F1 and F2 formants). It has been shown that formant frequencies for corner vowels are dependent on multiple subject and phonetic context factors (125) but within subject changes secondary to therapy for example could be potentially detectable if all other factors are kept constant. Changes in the formant frequencies from the sustained vowels and from the extracted corner vowels from stable parts of the read passage (‘Arthur the rat’) will be compared pre and post therapy.
4. Happy birthday to you
This song has been chosen as it is one of the most widely recognised songs in the English language. However, it is technically quite difficult for non-trained singers as it has a high note, an octave higher, (or seven note jump in the musical scale) than the starting note and small intervals that are near each other. Although reaching the top note does depend on the starting note, it is a reasonably good measure of the flexibility of the voice. The proposal is to perform acoustic and EGG recordings whilst singing the four lines. Each line will be analysed for changes in the LTAS spectral slopes pre- and post-treatment.
Clinician questionnaire
A non-validated questionnaire to evaluate the perceived effectiveness of the therapy from the Therapist’s perspective (Figure 9).
Qualitative analysis
Observation of CVT therapy sessions by SLT-V
All patients will be asked to give specific consent for observation of their therapy sessions by the experienced study SLT-V. However, not all those who give consent will be observed for practical reasons, but the aim is to observe at least one patient through the 6 weeks of their therapy and sample a further six therapy sessions at different stages of their therapy. A deductive approach will be applied for qualitative data analysis.
Direct observation of the CVT therapy sessions will be made by an experienced SLT-V. Observations will be based on the Rehabilitation Treatment Specification System (RTSS) (131) applied to the treatment of voice disorders as described by Van Stan et al, 2015 (23, 132). The RTSS-Voice identifies which clinician actions (‘ingredients’) actively improve specific patient functions (‘targets’) (23, 132, 133). This system will be applied to CVT-VT to help identify which ‘ingredients’ are being used and if they differ from traditional SLT-VT techniques. Further observations regarding the delivery methods, the type of feedback, progression rules and dosing of the ingredients will be made in line with the RTSS framework (23, 132). An attempt will be made to outline the hypothesised ‘Mechanisms of action’ and how both the ‘ingredients’ and ‘targets’ are linked to the patient ‘Aims’. This will enable a critical comparison of the type of techniques employed by the CVT-P to traditional SLT-VT techniques. In addition, observation provides a level of governance to ensure the patients goals are being met.
Transcriptions of therapy sessions
Transcription of therapy sessions will be performed ad verbum, with an explicit focus on the detailing of intervention techniques and exercises, i.e., how they are explained, illustrated, exemplified, how they benefit the patient and contextual usefulness. A further potential avenue of transcription relates to embodiments, in an attempt to transcribe illustrated movements (e.g., the movement of the breathing apparatus, techniques for adjusting the vocal tract related to the facial muscles, etc.) These will be used as supportive analyses for the documentation of the interventions for the development of good clinical guidelines for working with CVT interventions and for assessing the ‘ingredients’ of the interventions.
Those patients who agree for their therapy sessions to be recorded will have these recordings made using NUH Trust approved software on an NUH Trust laptop/computer together with the other patient data in a secured research database storage area. The anonymised sessional recordings will then be transcribed by a member of the research team and redacted to exclude any personal or identifiable information. In addition, a qualitative assessment of the anonymised therapists’ sessional treatment records will be made using a preliminary organising framework and initial codes based on the target of the therapy methods applied during the treatment sessions. These codes will be based on physiological principles, laryngeal setting, treatment methods based on the terms used in the voice therapy and CVT literature and will be coded using the qualitative research management software NVivo based on the principles of Template Analysis (134), a commonly used thematic analytical framework allowing for a priori and crystalising themes in qualitative analyses. The main aim is to identify the differences and similarities of SLT-VT and CVT-VT approaches to therapy.
Data handling and analysis plan
We will record and report the participant flow according to CONSORT guideline and produce a CONSORT flowchart (135). As a proof-of-concept study, we expect to analyse recruitment and retention data using descriptive statistics involving both intention-to-treat and actual completed participant data. We shall report recruitment and retention figures together with reasons for loss of participants. The amount of the missing data will be recorded to assess the feasibility of using each measure. An important part of the proof-of concept is also to assess whether six weekly treatment sessions provide adequate input to achieve the patient goals set at the outset of therapy.
The supervising team will monitor progress during treatment, consider any adverse effects and use that information to continue or halt the trial. There will be no data monitoring committee because the number of participants is small, and the primary data offer few opportunities for analysis methods. Recruitment and retention data will be analysed using descriptive statistics. Patients will be paid for travel costs for face-to-face research clinic appointments or if required to attend the outpatient clinic for video-linked therapy sessions if personal equipment is not adequate. The amount of missing data will be recorded and form part of the assessment of the utility of the outcome measure.
Quantitative standard descriptive and inferential statistics methods will be applied to compare pre-and post-therapy measures using the VHI as the primary outcome measure as well as the selected secondary outcome measures. For the outcome measures, descriptive data will include means and standard deviations for continuous normal data, and medians and inter-quartile ranges for continuous non-normal. Standard errors/confidence intervals will also be calculated for continuous normal data or percentages. Statistical analyses of audio and EGG recorded data will be performed using Speech Studio (Laryngograph) and the SPSS Statistics package (Vers. 24.0.0.2 IBM Corporation, Chicago, IL). The primary outcome measure (VHI) will be used to determine overall effect size, but other analyses of results will be subject to effect size estimation to help determine complimentary objective outcome measures. Together with the trial parameter data (i.e., recruitment, retention, follow-up, and completion rate), these data will be used to determine the size of sample necessary to carry out a fully powered randomised controlled trial comparing CVT therapy more pre-defined voice therapy techniques.
VHI
Paired-samples t tests will be applied to the pre-treatment and post-treatment primary outcome measure scores (total VHI score).
CAPE-V
For the secondary outcome measure CAPE-V, three experienced judges will be asked to rate all voice samples using the first four dimensions (overall severity, roughness, breathiness, and strain; on 100-mm visual analog scales (VAS) (101). The judges will be blinded to pre- and post-treatment status of the voice samples. The voice samples will be presented in random order. The rating forms will be scored, and then scores for each subject will be averaged across the three judges. Average scores will be used for group analyses. Interrater agreement will be assessed using procedures described by (136). Agreement equivalent to within 1 point on a 7-point Equal Appearing Interval (EAI) scale will be calculated for each possible pair of raters for each voice sample. On a 100-mm visual analog scale, scores that fall within 7.2 mm will be considered to be in exact agreement on a 7-point EAI, and scores that are within 21.5 mm (7.2 + 14.3 mm) will be considered to be within 1 scale value. Two scores will be considered to agree if they fall within ±21.5 mm on the VAS (probability of chance agreement p = 0.39). The probability of agreement will be calculated by totaling all pairs of scores that agree and dividing by the total number of score pairs. Twenty percent of the voice samples will be randomly selected to be repeated for each judge. Repeat ratings will be used to assess intra-rater agreement, using the same calculation. To test for differences in the four CAPE-V dimensions, a multivariate analysis of variance will be applied to the pre-treatment to post-treatment change (Δ). Paired-samples t tests will also be applied to pre-treatment to post-treatment changes for each of the four dimensions within each group. A multiple comparison adjustment will be made for the four comparisons using the Bonferroni multiple-comparison procedure. The confidence intervals will be similarly adjusted for four comparisons, so the reported 95% confidence interval (CI) will be a 98.75% CI, which is a Bonferroni-adjusted CI.
Acoustic, EGG and MPT measures
Paired-samples t tests will also be applied to the secondary acoustic outcome measures (EGG, acoustic and MPT measures), together with 95% confidence intervals.
To control for Type 1 error of the primary outcome and secondary outcome analyses while maintaining an acceptable level of statistical power, the alpha level for all t-test comparisons will also be assessed at .025. Treatment effects will be assessed by comparing the pre-treatment to post-treatment change (Δ) scores, plotted as a forest plot in order to display the average Δ change for each outcome measure and corresponding 95% confidence interval. Any significant treatment effects will be followed up with calculations of effect size using Cohen’s d.
Measurement reliability for acoustic analyses will be calculated by reanalysis of 10% of the recordings. The recordings chosen for reliability analysis will be randomly selected from the total number of recordings. Pearson product–moment correlations will be used to assess inter-measurer and intra-measurer reliability of the acoustic and EGG measurements.
Other Questionnaires
Comparison of the total score of the frequency and the total score of the severity of symptoms evaluated in the Vocal Tract Discomfort scale (VTDS) pre- and post-treatment will be compared using paired sample t-tests. The comparison of the frequency and the severity of particular symptoms of the VTD scale pre- and post-treatment will be made by means of the non-parametric Mann – Whitney U test. Comparison of the frequency and the intensity of particular VTDS items in the respective subgroups of patients pre- and post-treatment using ANOVA. Finally, the relationships between the results of the VTD scale and VHI and MPT will be estimated by means of the Pearson r coefficient. Other non-validated questionnaires will be reported using descriptive statistics.
Dissemination policies
The aim of dissemination will be to inform other Speech & Language therapists and CVT practitioners of the outcome of this approach of treating patients with MTD. This will be achieved through scientific conference presentations and workshops. A paper will be written for a peer-reviewed publication.