The present prospective cohort study was conducted following the Declaration of Helsinki.[67] The KHCC Institution review board approved the study with IRB number 19 KHCC 127.
Participants
Participants underwent a thorough swallowing and speech assessment and were diagnosed with any type of HNC according to criteria proposed by the Head and Neck KHCC Guidelines.[8] All (HNC) patients who underwent a (FEES) examination at the Speech and Swallowing Outpatient Clinic between 2020 and 2022 were included in the study. This study’s total clinical data set consisted of 130 patients, with 108 males and 22 females. These patients were selected from those seen at a routine speech outpatient clinic. They were invited to participate if they met the following inclusion criteria: being above the age of 18, being newly diagnosed with any HNC (including laryngeal cancer, nasopharyngeal cancer, oropharyngeal cancer, hypopharyngeal cancer, thyroid cancer, or neck cancer), and having cancer at any stage (T1, T2, T3, or T4).
Exclusion criteria included a medical history of gastroenterological, respiratory, rheumatologic, metabolic, or hematologic disorders and any previous completion of radiotherapy, chemoradiotherapy, or multimodality treatment treatments. Patients who had undergone surgery, such as total laryngectomy, tracheostomy, or PEG tube insertion, were also excluded from the study.
Informed consent was obtained from all patients during regular outpatient speech and swallow clinic visits.
Swallowing Protocol.
In the dysphagia outpatient clinic, a standardized examination protocol was implemented for regular healthcare assessment. The protocol encompassed a clinical ear, nose, and throat examination conducted by a laryngologist to assess the integrity of cranial nerves. The A-EAT-10 assessment tool was also administered, a validated tool for measuring patients’ perception of their swallowing problems.[16] This tool is widely employed in daily clinical practice at the outpatient clinic for patients with or without dysphagia, which includes various etiologies, not limited to (HNC). Many patients self-reported dysphagia, but their swallowing function appeared within the normal range upon assessment. Their complaints seemed to vary compared to their pre-cancer experiences, such as xerostomia or loss of appetite. On the other hand, the assessment revealed dysphagia and weight loss before treatment, significantly affecting the patient’s nutritional status.
The A-EAT-10 is a self-administered, symptom-specific outcome instrument for dysphagia. It comprises ten statements patients rate on a scale of 0 to 4, with 0 indicating no problem and 4 signifying a severe problem. During their regular visits to the speech clinic, any patients with or without dysphagia who met the inclusion criteria were asked to complete the A-EAT-10 questionnaire by themselves. For patients who required assistance due to literacy issues, a (SLP) facilitated the process by verbally presenting the questions and recording their responses. The A-EAT-10 offers an overall assessment of dysphagia and its associated symptoms.[16] A senior SLP and ENT physician performed the standardized fibre optic endoscopic evaluation of swallowing (FEES) examination. An OtoPront PES PILOT HDpro stroboscope flexible endoscope with a 2.7 mm diameter, equipped with Video Nasopharyngoscopes VN-S and VN-P (CHIP-ON-THE-TIP) and an automatic switching system from Berlin, Germany, was utilized. All recorded videos were processed using the PILOT system and stored in an anonymous format.[19] Each FEES examination adhered to the clinical practice guidelines for fibre optic examination in the adult population.[20] Patients were comfortably seated in a chair reclined between 75 and 90 degrees, with their arms resting on the armrests and their heads in a neutral position, ensuring the best posture for the examination. Local anaesthetic drugs, such as lidocaine spray, were not used to avoid altering pharyngolaryngeal sensibility.[21] The endoscope was introduced into the most expansive nasal cavity and maintained just below the uvula to maximize the field of view, including the larynx, the glossoepiglottic valleculae, and the pyriform sinuses.[21] During the FEES examination, three different food textures were administered to evaluate swallowing safety and efficiency:
Liquid
Thin liquid at room temperature (IDDSI Level 0) was used for thin liquid trials. [22]
Moderate thick liquid
Room temperature yoghurt (IDDSI Level 4) was employed for moderate thick liquid trials. [22]
Solid
A quarter and half of an 8-gram piece of dry bread (4 grams per trial; IDDSI Level 7) were used for solid trials. [22]
Some food bolus consistencies were not administered to all patients for safety reasons and to minimize the risk of severe aspiration. Consequently, the study included only subjects who had undergone at least one trial with thin or moderate thick liquid consistencies.
FEES examinations were rated independently by three operators using the video files. Two of them were speech and language therapists (SLPs and one Senior ENT physician.), all three with at least five years of experience in FEES examinations. SLPs and ENTs were blind to each other and participants’ data, since videos were stored anonymously. Two independent SLPs rated the videos using validated ordinal scales for swallowing safety and efficiency; inter-rater reliability between the two raters was analyzed. In case a difference > 1 level at each FEES rating scale occurred between the two raters, a 3rd SLP assessed the videos and decided on both ratings.[27]
The parameter used to analyze the FEES is safety impairment (Penetration/aspiration). The severity of penetration/aspiration was rated using the Penetration Aspiration Scale (PAS).[23] The PAS is an 8-point scale ranging from 1 (materials do not enter the airway) to 8 (materials enter the airway, pass below the vocal folds, and no effort is made to eject). [21] Penetration was defined as the bolus entering the laryngeal vestibule over the rim of the larynx (PAS score from 2 to 5). Aspiration was defined as the bolus passing below the true vocal folds (PAS score six or above). The swallowing safety was also evaluated, similar to Tabor et al.’s study. [24] In particular, based on the PAS score, each swallow was classified as unsafe if the material entered the laryngeal vestibule (PAS ≥ 3). In addition, to analyze the timing of unsafe swallows, each event was classified as “before”, “during”, or “after” the swallow. The worst PAS score for each consistency and each subject was considered for statistical analyses.
The table in the supplemantray file (table S1) illustrates the data collection of A-EAT-10 and the timing of the FEES application. All patients attended regular follow-up visits throughout treatment, and each received ongoing support from an SLP. The SLP’s role includes maintaining, educating, and offering counselling concerning dysphagia symptoms, providing guidance on PEG tube utilization (if applicable), and administering swallowing exercises.
Statistical Analysis
This analysis works towards validating A-EAT-10 as a diagnostic tool for dysphagia compared with the traditional FEES test on 130 HNC patients at our hospital’s speech and swallow clinic. The sample size is calculated based on Cohen’s kappa coefficient for the agreement between two raters.[25] The sample size calculation assumed a minimum acceptable kappa (\({\kappa }_{0}\)) of 0.61 (moderate to substantial agreement), and an alternative expected kappa (\({\kappa }_{1}\)) to be 0.4 (fair agreement). The expected prevalence of dysphagia is 30% among the participants. The power of the test is 80%, with a type I error equal to 0.05. These assumptions resulted in an estimated required sample size of 132 patients. At the beginning of the study, we recruited 143 participants. However, we ended up with 130 who completed the study.[2] The reasons for dropouts and attrition included ten patients who had undergone tracheostomy and PEG tube insertion before the start of treatment, following KHCC guidelines. Additionally, two patients unfortunately passed away during the follow-up period, and one was referred to palliative care. Based on our adopted protocol, the kappa coefficient test requires the largest sample size. Therefore, this sample size is sufficient for the other tests used in the analysis.[26]
The analysis’s general theme is measuring agreement between diagnostic tools under the imperfection of the gold standard and latent class analysis. The methods in this area are somewhat confusing because of many issues related to the theory. The number of alternatives and lack of consistency in the literature are significant sources of confusion even for statisticians. [27–30] We developed a simple protocol to identify the ‘best’ method(s) we can employ for the dataset in hand, considering the nature of the data.[27, 31]
The first consideration is to identify the aim of this comparison, as mentioned earlier, to validate the A-EAT-10 (self-report-based questionnaire) Arabic version using a traditional diagnostic tool as a benchmark, i.e., the FEES test. The established traditional method does not always reflect the truth, and there is some inaccuracy (error). Hence, considering FEES a ‘gold standard,’ i.e., not to have measurement errors, will be inappropriate because even this method does not produce accurate results.[32]
The second consideration is the dependency issue. Since the two tests are performed on the same patient, there is a risk of dependency. We will discuss several methods in the protocol developed under the independence assumption.[33, 34] However, such an assumption can lead to biased or unrealistic results.[33, 35] The advantage of the correction methods that assume the tests’ independence is that they are computationally less demanding. Relaxing the assumption of dependence leads to direct approaches to calculations.[36–38] This approach is not the case when the conditional independence assumption is not satisfied, which is the case in most situations.[35]
Given these considerations, we considered using measures of the agreement. Additionally, we used sensitivity and specificity to assess the validity of A-EAT-10 given the imperfection of the gold standard, i.e., the FEES test and the data dependency. The protocol for analyzing the agreement is attached as a supplementary file, but we can summarize it as follows:
Weighted and unweighted kappa, as we assessed having dysphagia as being higher in value in both tests than not having dysphagia. Cohen’s kappa measurement of agreement can measure the agreement between categorical variables. Kappa can be used to assess the agreement between alternative methods of categorical agreement, as in our case.[25, 39] This test is mainly developed for nominal classification. However, the weighted version uses linear, quadratic, and other weights for ordinal data.[40] As mentioned above, we tried to test both nominal processes’ classification using unweighted kappa versus weighted kappa by considering dysphagia as a higher order because it is associated with a higher score on the A- EAT-10 test. However, since we only have two outcomes (dysphagia vs normal), both methods produced identical results.[41, 42] The agreement cut-off points are subjective. There is no standard way to assess the strength of the agreement.[43, 44] (See table S3 and S4 in supplementary file for the criteria).
Intra-class correlation test (ICC) with a single random effect.[45] The advantage here over the kappa coefficient is the different randomness parts introduced in the model. As in our case, the single random effect (mixed effect) assumes the patients’ randomness only, and the two assessment methods are fixed.[46] In other words, it assumes that our patients are taken randomly from a population of patients, which is the case. However, it assumes that the classification methods are the only fixed ones. In other instances, when the classification methods are chosen from a population of methods (different available tests), ICC will be two-way random. The interpretation of ICC is like kappa. (See table S2 in supplemantary file for the criteria).
The traditional sensitivity-specificity analysis will include the receiver operator curve (ROC) and the area under the curve (AUC). However, this approach will be adjusted to account for the imperfection of the gold standard, i.e., the FEES test.[32] This imperfection means that a straightforward comparison will lead to certain biases in the results.[33] Corrections with prevalence rates between 0.1 and 0.9 show similar results in detecting reality, with superior results under conditional independence between the tests for Staquet et al. correction.[33] Thus, we will consider Staquet et al. [36] correction as explained in the supplementary file. The correction for the imperfection required the establishment of the imperfect gold standard sensitivity and specificity from previous studies or a test group.[33, 36] Previous studies show that FEES has approximately 80–87% sensitivity and 81–100% specificity in detecting dysphagia.[32, 47] We are correcting based on these levels. There is no established sensitivity-specificity regarding our search for HNC patients. Therefore, we will use the studies’ lowest and highest reported levels as our thresholds for the imperfect gold standard: sensitivity (80% and 87%) and specificity (81% and 100%). Alternating between different levels of sensitivity and specificity will act as a sensitivity analysis. This process will provide an upper and lower bound for the EAT-10 test’s specifications.
Random Effects Latent Class Analysis (RE-LCA) is employed under the assumption of conditional dependence.[48] This approach assumes two latent classes (with and without the condition): dysphagia vs normal. The condition’s actual status (dysphagia vs normal) is unknown, and the best reference test, FEES, indicates the status. This test is referred to as an imperfect gold standard. Therefore, the actual status of the condition is ‘latent’. Like the previous point, the sensitivity and specificity of FEES are assumed to be known.
The analysis was conducted using R version 4.2.1.[49] The packages used in the analysis are "irr",[50] "psych",[51] "vcd",[52] "DescTools",[53] "gpubr",[54] "tidyverse",[55] "pROC",[56] "rstatix",[57] and "randomLCA"[58].