Assessment of the degree of agreement between the judgment of tracheotomy score and the clinician: a retrospective analysis

Background: In oral cancer surgery, the decision to tracheotomize is often determined based on the experience of the surgeon. Sometimes, tracheotomy may be performed for cases that may not necessarily require tracheotomy. For such cases, safety is achieved by airway management, but the patients are exposed to tracheotomy-related complications. Several evaluation methods have been reported to predict the need for selective tracheotomy in patients with oral cancer. In this study, we investigated the competency of clinical scoring systems in identifying patients who require tracheotomy among the oral malignancy cases treated with surgery in our department, and examined the degree of agreement between the surgeon’s decision and the scores of various scoring systems. Methods: This study was conducted on 110 patients with oral cancer who were treated with surgery under general anesthesia in the Department of Oral and Maxillofacial Surgery, Nagoya Ekisaikai Hospital, between January 2007 and April 2018. Among them, 67 patients (44 male and 23 female), who were managed by resection and reconstruction, were retrospectively analyzed. To derive the score, we evaluated the endpoint of these indices from clinical records and images. We divided the patients, based on the Cameron and Gupta scores, into two groups: tracheotomy and no tracheotomy groups, and evaluated the degree of agreement by calculating the κ coefficient. Results: After the assessment, the κ coefficients of the Gupta and Cameron scores were 0.61 (95% CI, 0.4-0.82) and 0.6 (95% CI, 0.38-0.82), respectively. The clinical evaluation of the κ coefficient indicated that the Gupta and Cameron scores agreed substantially. Discussion: These score matched the decision of the surgeon and confirmed that it was able to be applied to the decision of the airway management. However, these values are affected by prevalence. When unilateral total neck dissection and resection of the primary lesion were performed, though it was high-risk, the score was low, and an evaluation

. Notably, the causes of airway obstruction include postoperative hematoma, pharyngolaryngeal edema, and morphological changes of the airway; thus, appropriate airway management is required [6,7]. There are three methods of postoperative airway management: 1) extubation, 2) endotracheal intubation under sedation, and 3) tracheostomy. At present, tracheostomy or prolonged intubation remains the major modality for airway management for patients with oral and oropharyngeal cancers undergoing major surgery. However, no clear criteria currently exist for determining which method to select. Therefore, the method is often determined based on the experience of surgeons, considering inter-institutional differences and patient characteristics. However, according to literature, tracheotomy is associated with reported complication rates of 8-45% [8].
However, these methods may not be applicable for all cases due to the disparities between institutions or differences in patient backgrounds. Enforcement of tracheotomy as an endpoint is determined by a physician. Therefore, it is necessary to examine the agreement between the inter-rater evaluations on the need for tracheotomy, and to further plan standardization of the evaluation process. However, such examinations are not conducted.
The purpose of this study was to calculate a kappa coefficient to examine the degree of agreement between the physician's subjective evaluation and tracheotomy score evaluations.

Methods
We performed a retrospective analysis of these variables for all patients who had resection and primary flap reconstruction in our department, and included 110 patients with oral cancer (76 males and 34 females) who were treated with surgical methods under general anesthesia at the Department of Oral and Maxillofacial Surgery, Nagoya Ekisaikai Hospital, between January 2007 and April 2018. The study subjects were patients who underwent either broad resection of the primary lesion, followed by epidermization or major composite resections with reconstructions. Cancer staging was performed based on inspection, contrast-enhanced computed tomography, magnetic resonance imaging, and positron emission tomography-computed tomography. Surgical procedures and methods for airway management for all patients were discussed and selected during a tumor conference at our department. The decision for postoperative airway management was made based on the operator's experience. Usually large tumors (T4), the mouth floor or posterior lesions on the tongue, and bilateral neck dissection were considered for elective tracheotomy. Various factors contributed to the need for tracheotomy in patients, ranging from the extent of surgical resection to 1) patient background; 2) resection procedure, inclusion/exclusion of neck dissection, and reconstruction procedure; 3) method of airway management; and 4) scoring based on previously reported indices [8][9][10][11]. The parameters of the indices were evaluated and scored based on the patients' medical and imaging records. Patients scored by the Cameron and Gupta Scores were divided into 2 groups: the group requiring tracheotomy (tracheotomy group) and the group not requiring tracheostomy (no tracheotomy group), and agreement with the actual performance or nonperformance of tracheotomy was evaluated with the κ coefficient.
The parameters were studied for their impact on contribution towards a need for tracheotomy in a patient. A tracheotomy score, which was adopted from the scoring system recommended by Cameron (2009), was used to evaluate the state of the patient's airway based on the type of operation. We analyzed an inter-rater agreement between the physician's decision on tracheotomy and the tracheotomy score's evaluation using a kappa statistic.

Ethics
This retrospective cohort study was approved by the Nagoya Ekisaikai Hospital Ethics

Statistical analysis
The kappa coefficient (κ) was used to evaluate reliability among evaluators and to compare the different methods with regard to the number of canals identified. The κ coefficient was used instead of the intraclass correlation coefficient for ordinal scale scores. The agreement between the surgeon's decision and airway management suggested by the scores was analyzed using the κ coefficient. Statistical significance level was < 5%. All statistical analyses were performed with EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan), which is a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria). More precisely, it is a modified version of R commander designed to add statistical functions frequently used in biostatistics [13].

Results
A total of 110 patient records were included in the study. Patient characteristics, including tumor sites, operative approaches, and postoperative courses, are shown in Table 1.
Patients were aged between 42 and 88 years (mean age, 63.4 ± 11.0). Primary lesion excision only was performed for 24 patients; among these, extubation was performed in 21 patients, intubation under sedation was performed in 1 patient, and tracheotomy was performed in 2 patients. Furthermore, primary lesion excision and neck dissection were performed in 30 patients; among these, extubation was performed in 1 patient, intubation under sedation was performed in 8 patients, and tracheotomy was performed in 21 patients. As a result of metastasis, neck dissection only was performed in 13 patients; extubation was performed in all of these patients (Table 1). Detailed results of scores, using the methods reported by Cameron [9] and Gupta [11], are shown in Fig. 1 and Table 2. The number of patients suggested to require tracheotomy was within the range of 0-7 (Cameron Score) and 0-8 (Gupta Score). Scoring using the methods reported by Cameron and Gupta clearly indicates whether a patient requires tracheotomy. Tracheotomy was performed in 9 patients in the no tracheotomy group rated by the Cameron Score. The details were as follows: partial glossectomy and total neck dissection (n = 3), posterior partial glossectomy (n = 2), buccal mucosa and total neck dissection (n = 2), marginal mandibulectomy (n = 1), and segmental mandibulectomy (n = 1). The details of the patients undergoing tracheotomy in the no tracheotomy group rated by the Gupta Score were as follows: partial glossectomy and total neck dissection (n = 3), posterior partial glossectomy (n = 2), hemiglossectomy, forearm flap reconstruction, and total neck dissection (n = 5), subtotal glossectomy, forearm flap reconstruction, and total neck dissection (n = 3), marginal mandibulectomy plate reconstruction (n = 1), total neck dissection and oral floor resection (n = 1), and total neck dissection (n = 1). The details of the patients who underwent tracheotomy in the no tracheotomy group rated by the Tracheotomy was not performed in 2 patients in the tracheotomy group rated by the Cameron Score. One was sedated and intubated for total neck dissection, buccal mucosal resection, and mandibular segmental resection; and the other was sedated and intubated for mandibular segmental resection, hard tissue reconstruction, and total neck dissection.
One patient in the tracheotomy group rated by the Gupta Score did not undergo tracheostomy. This patient was intubated for hard tissue reconstruction, total neck dissection, and segmental mandibulectomy. Ten patients in the tracheotomy group rated should be identified such that they can undergo continued intratracheal intubation or selective tracheotomy [16,17]. Typically, intratracheal intubation under sedation is used to maintain the airway between 24 and 48 hours after surgery. Selective tracheotomy is recommended in cases where intratracheal intubation must be maintained for more than 2 days. In a national survey in the UK, tracheotomy was selected for 69% (39/57) of patients who underwent free-flap head and neck reconstructive surgery [18]. However, complications, such as bleeding, occlusion, local infection, and pneumonia occur at a rate of 4-8% in tracheotomy [17][18][19][20]. These complications result in prolonged recovery of the patient and longer hospital stays. Appropriate strategies for airway management, therefore, remain controversial, and selective tracheotomy is determined based on the surgeon's experience, which often varies between individuals. Therefore, research to establish criteria for performing tracheotomy is required. There have been several reports regarding evaluation methods to predict the necessity for selective tracheotomy in patients with oral cancer [9][10][11][12]. However, these methods may not be applicable for all patients due to disparities among institutions or differences in patient backgrounds. In this study, we examined the level of agreement between the need for tracheotomy based on these evaluation methods and the surgeon's decision, as well as cases with a discrepancy between them.
Kruse et al. were the first to report a scoring system that can be used to determine the need for tracheotomy [9]. They scored and evaluated 5 components (tumor localization, tumor size, pathological chest X-ray findings, multimorbidity, and alcohol consumption) to predict the risk of postoperative respiratory failure in 928 patients. was 99.3%. Sensitivity was low in our patients, although selectivity and PPV were greater than 90%. In this system, scores are given for surgery (especially areas of resection) and reconstruction procedures. In terms of prediction of airway management associated with surgery, details of surgery are incorporated within this system in comparison with the other systems; therefore, this system is expected to be useful for surgeons.
Whether high sensitivity or high selectivity is required for these tests depends on the clinical state and study population. Since a number of analyses showed false-negative results depending on the criteria, scores became relatively low in partial glossectomy, as well as in cases where forearm flap reconstruction was performed with pull-through or supraomohyoid neck dissection. It is controversial as to whether to perform tracheotomy or to simply maintain intratracheal intubation under sedation in such cases. However, it is understood that tracheotomy is not performed on a large number of cases, despite high risks of postoperative airway obstruction. Careful postoperative monitoring and structures for managing emergencies are required for such cases to avoid malpractice. In addition to a system that can be used for distinguishing at-risk patients from those with falsenegative results, it is important to combine several systems for evaluation. Similarly, false-positive diagnosis for tracheotomy must be avoided for patients who do not require this procedure. In this study, the κ coefficients for the Cameron and Gupta Scores were 0.61 and 0.6, respectively, indicating substantial agreement. The reason why the patients who needed tracheotomy based on the scores actually did not undergo tracheostomy was considered as follows: the Cameron and Gupta Scores were high in patients who underwent resection of the mandibular area and surrounding tissue (e.g. buccal mucosa or floor of mouth) and hard tissue reconstruction. However, in these cases, the surgeons decided that intubation under sedation was possible when postoperative aspiration was not a concern. We confirmed that the Cameron and Gupta Scores were consistent with the surgeon's decision to some extent and could be applied generally to clinical decisions in hospitals in Japan.
These scores are weighted for bilateral neck dissection. However, when unilateral total neck dissection was performed along with resection of the primary lesion, the score was not always high, which was considered to be the cause of discrepancy. These values are affected by prevalence, although scores were effective for screening postoperative airway management. However, Schmutz et al. have reported that patient population differs by institution; therefore, they failed to predict the need for tracheotomy based on these clinical scoring systems [21]. Similarly, Lee et al. reported that they could not identify correlations between the need for tracheotomy and the clinical findings for patients with oral cancer based on the Cameron score [22]. Moreover, Benetar et al. conducted analysis based on the Cameron score for performing elective tracheotomy in oral cancer surgery.
Analysis revealed high selectivity and PPV (90% for both), low sensitivity (70%), and NPV (67%); these findings made it difficult to determine whether tracheotomy was necessary [23]. Upon reviewing reports that re-evaluate these scores, patients who actually require tracheotomy may not be accurately identified, and tracheotomy is suggested in a large proportion of cases. This is likely due to large differences in choices of surgical methods, decisions based on surgeons' experience, and patient population. Moreover, postoperative hematoma and pharyngolaryngeal edema cannot be predicted directly from the scores investigated in this study. Clinically, airway obstruction rapidly advances in such cases, which then becomes irreversible. Intratracheal intubation or tracheotomy must be selected under emergency situations; these urgent decisions are the largest problem.
These scoring systems were reported years ago, and since then, there have been advancement in equipment (e.g., energy devices), improvement in surgical techniques and perioperative care, and extensive changes in the applicability criteria of surgical procedures. Establishment of a scoring system that accommodates such advancements and changes is required in the future.
There were a few limitations in this study. Notably, we excluded patients who underwent cerclage because the risk of postoperative airway obstruction was expected to be low in such cases. However, there have been reported cases of severe outcomes resulting from occlusions, even if resection was solely performed on the frontal part of the mandible, as well as in cases where only the primary lesion was excised or a single neck dissection was performed [22,23]. Even in cases that appear to be low risk, resection or abrasion of the genioglossus muscle, geniohyoid muscle, or mylohyoid muscle may cause deterioration in airway obstruction due to the loss of support of the hyoid bone. Such procedures with moderate surgical invasion are managed outside the intensive care unit and, therefore, pose a risk of delayed treatment of the airway obstruction. Evaluations of such cases are required in future.

Conclusions
We examined the degree of agreement between the physician's evaluation and tracheotomy scores' evaluations on the need for tracheotomy. After the assessment, the κ coefficients of the Gupta and Cameron scores were 0.61 (95% CI, 0.4-0.82) and 0.6 (95% CI, 0.38-0.82), respectively. As a result, moderate congruity was found between the physician's evaluation and the tracheotomy scores' evaluations. In this study, the Cameron and Gupta scores agreed with the surgeon's experienced judgment to some extent, and they were confirmed to be able to adapt to clinical judgment in the hospital setting. These values are affected by prevalence; however, the scores were effective for screening postoperative airway management. Importantly, postoperative hematoma and pharyngolaryngeal edema cannot be predicted directly from the scores investigated in this study; therefore, scoring systems that can accommodate such changes should be established.

Ethics approval and consent to participate
The present retrospective cohort study was approved by the Nagoya Ekisaikai Hospital Institutional Review Board (no. 2018-009). Individual consent from each patient was not required because extubation after major oral cancer surgery is the standard approach in our institution. Since the study was retrospective, only closed cases were included, and the institutional review board waived the need for obtaining the consent of the retrospectively analyzed patients. All procedures were performed in accordance with the ethical standards of the institutional and/or national research committee and in line with the 1964 Declaration of Helsinki.

Consent for publication
Written informed consent was obtained from the patient for publication of this case report and any accompanying images.

Availability of data and materials
All data generated or analyzed during this study are included in this published article. The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
The present research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Authors' contributions
AA conceived, designed, and coordinated the study; and wrote the manuscript. KK and EU critically revised the manuscript for important intellectual content and gave the final approval of the version to be submitted. HH, YI, and MA collected the clinical data and drafted the article. All authors read and approved the final manuscript. Figure 1 Number of patients in the airway management group, according to tracheostomy score a: Cameron score b: Gupta score