A Comparison between W score and Ryan score in diagnosing of Laryngopharyngeal reux disease

Objective To assess the diagnostic value of W score which was supposed to identify laryngopharyngeal reux disease (LPRD) patients from the normal population by Dx-pH monitoring, comparing with Ryan score. Methods One hundred and eight patients with suspected LPRD and complete follow-up results after more than 8 weeks of anti-reux therapy were enrolled from the Department of Otolaryngology-Head and Neck Surgery, Gastroenterology and Respiratory Medicine of seven hospitals. Their Dx-pH monitoring data before treatment were reanalyzed to obtain the W score besides Ryan score and then the diagnostic sensitivity and specicity were compared according to the result of anti-reux therapy. Eighty-seven (80.6%) cases were anti-reux therapy effective, and 21 patients (19.4%) were ineffective. Twenty-seven patients (25.0%) had a positive Ryan score. The W score was positive in 79 (73.1%) patients. There were 52 patients who had a negative Ryan score, but a positive W score. The diagnostic sensitivity, specicity, positive predictive value and negative predictive value of the Ryan score were 28.7%, 90.5%, 92.6% and 23.5%, respectively (kappa = 0.092, p = 0.068), whereas those of the W score for LPRD were 83.9%, 71.4%, 92.4% and 51.7%, respectively (kappa = 0.484, p < 0.001). W score is much more sensitive for the diagnosis of LPRD. Prospective studies with larger patient populations are necessary to validate and improve the new diagnostic criteria. This study was conducted as a retrospective analysis of patients who were suspected with LPRD and with complete 24h pharyngeal pH monitoring. All the patients Adult patients with LPRD related symptoms 1 RSI ≥ Scale


Introduction
Laryngopharyngeal re ux disease (LPRD) is a series of diseases caused by the re ux of gastric contents up to the upper esophageal sphincter and affects as high as 10.15% of adult outpatients in the otorhinolaryngology-head and neck surgery department of class A tertiary comprehensive hospitals in China [1]. A landmark study by Koufman in 1991 demonstrated that LPRD and gastroesophageal re ux disease (GERD) should be recognized as distinct entities [2]. In contrast to typical GERD, LPRD usually does not have typical symptoms, such as acid regurgitation and heartburn, and many of its signs occur without speci city. The speci city and sensitivity of the re ux symptom index (RSI) based on re ux-related symptoms and the re ux nding score (RFS) based on laryngoscopic signs are insu cient [3]. Many experts insist that empirical drug treatment is a reliable method of LPR diagnosis [4,5]. Nevertheless, there are many problems in the standard of starting treatment, placebo effects and patient compliance [4,6]. It is more important to diagnose LPRD by objective means than GERD.
Presently, the most widely used objective detection method is dynamic pH monitoring. In 2009, Ayazi reported a new technique for measuring pharyngeal pH using the Restech Dx-pH Measurement System (Dx-pH) [7]. Besides its easy operation, high sensitivity and good patient tolerance, the probe can detect gas re ux and this technique has become a useful tool for diagnosing LPRD [8][9][10][11][12]. The Ryan score which is proposed by Dr. DeMeester has played a signi cant role in the diagnosis of LPRD, especially in the evaluation of operation indications and surgical e cacy for LPRD patients. Nevertheless, its sensitivity is low which means a negative Ryan score is not a su cient indicator for excluding LPRD because of the imperfectness in its development [13]. Based on pH monitoring, the Ryan score used the normalized distances to the 95th percentage of distribution of 55 "normal" subjects who were selected merely to exclude typical GERD symptoms as the indicator. Hence, a new test with more scienti c statistics is needed.
In a recent published article, we introduce a new score (named W score) based on machine learning techniques and semi-supervised leaning way [13]. In that article we elaborated the scienti c and advanced nature of the test by which W score was proposed. Based on W score a subject can be diagnosed with LPRD if W > 0 and without LPRD if W < 0. Additionally, W score which bene ts from the linear model structure provides an indicator of the probability that a patient is suffering from LPRD. Namely, the smaller the value is, the lower the possibility that the patient is suffering from LPRD. But whether W score performs better than Ryan score in clinical practice is still lack of strong evidence so the assessment of the e ciency was done.

Study Population
This study was conducted as a retrospective analysis of patients who were suspected with LPRD and with complete 24h pharyngeal pH monitoring. All the patients came from the Department of Otorhinolaryngology, Gastroenterology and Respiratory of 6 comprehensive hospitals in China and one hospital in the U.S.A. from 2016 to March 2019. Adult patients with LPRD related symptoms (dry pharynx, pharynx itching, cough, pharynx foreign body sensation, burning sensation of the pharynx, etc.) for more than 1 month whose RSI score ≥ 10 and/or RFS score > 7 and/or Visual Analogue Scale (VAS) of the most serious symptom ≥ 5 and unsatisfactory therapeutic effect from conventional treatments were considered. All of the patients had never tried anti-re ux therapy before. One hundred and eight cases who had complete follow-up results after 2-3 months anti-re ux therapy were enrolled. The LPRD diagnosis was considered if patient responded or partially responded. The diagnosis of non-responder patients (signs and symptoms unchanged/worsened) remained uncertain and additional examinations were suggested. (Fig. 1.) The research was conducted according to the Declaration of Helsinki and the ethics approval was obtained from the Ethics Committee of PLA 306th Hospital (K2017 [06] ; Clinical Trial Registry: ChiCTR1800014931). All participants provided written informed consent.

Methodology
Scale scoring: All patients lled out RSI and underwent stroboscope prior to and after treatment. RFS was determined by two blinded senior physicians who worked independently and did not know the clinical state of the subjects in advance. VAS was scored according to the most serious symptom.
Oropharyngeal pH monitoring: Oropharyngeal pH monitoring was performed before treatment using a Dx-pH system and the Restech® pH probe (Respiratory Technology Corp., San Diego, CA, USA). After calibrating in pH 7 and pH 4 buffer solutions, the probe was inserted through the nasal cavity until its ashing LED tip was seen 5-10 mm below the uvula. The pH monitoring lasted 24 hours, and all subjects were asked to participate in normal daily activities as much as possible and to record the beginning and ending time of eating, as well as when they were in an upright and supine position.
Ryan Score was calculated from original data by software (Bio View Analysis, Sandhill Scienti c, Highlands Ranch, CO, and Data View Lite, Respiratory Technology Corp, San Diego, CA). LPRD can be diagnosed if Ryan Score was > 9.41 in the upright position or > 6.79 in the supine position. Original pH data were also calculated to obtain W score and the detailed calculation methods can be found in our recent published article [13]. Based on W score a subject is diagnosed with LPRD if W > 0 and without LPRD if W < 0.
Anti-re ux therapy: Both lifestyle adjustment suggestion and anti-re ux medicine were given. According to the individual situation of each patient, speci c suggestions for lifestyle adjustment were given, such as avoiding caffeine or theobromine, quitting alcohol and tobacco, increasing exercise and reducing weight. Medications were given at the same time, such as PPI, H 2 antagonists, GI prokinetic agents and Gastric mucosal protective agent. Patients will be followed up after 2-3 months of anti-re ux therapy.

Data analysis
Data analysis was performed using Statistical Package for the Social Sciences, version 20.0 (SPSS, Chicago, IL, USA). The measurement data of normal distribution were expressed by mean ± SD. The data of skewed distribution were expressed by M [P25; P75] The diagnostic treatment was taken as the gold standard. A comparison of re ux related scales before and after treatment was performed with Student's t test. Sensitivity, speci city, positive and negative predictive values, and diagnostic accuracy of the two scores were assessed respectively. The comparison of categorical data was done with Pearson Chi-squared test. P < 0.05 was considered statistically signi cant.

Results
One hundred and eight patients were evaluated. After 2-3 months of follow up, 87 patients (80.6%) were effective (respond + partially respond group) after anti-re ux therapy, and 21 patients (19.4%) were ineffective (non-respond group). The Ryan score was positive in 27 patients (25.0%) and W score was positive in 79 patients (73.1%). There were 52 patients who had a negative Ryan score, but a positive W score. (Table 1)  The mean RSI and RFS values of the effective group were signi cantly decreased after the anti-re ux therapy (P = 0.000), so was the VAS of the main symptom (P = 0.000). In the ineffective group, there was no signi cant difference in the mean value of RSI, VAS and RFS before and after treatment. (Table 2)

Discussion
A series of symptoms, signs or diseases, such as dysphonia, hysteria, cough, subglottic stenosis, dysphonia, laryngeal spasm, laryngeal contact granuloma, asthma, and even chronic sinusitis and laryngeal cancer, are associated with LPRD. However, there is still a lack of a diagnostic method with high sensitivity and speci city, good patient acceptance and easy operation that has been widely recognized by clinicians. Although the RSI and RFS are proven clinical tools, they do not include many common symptoms and re ux signs and they do not take into account the frequency of symptoms [3,[14][15][16][17]. Moreover, the patients' own emotional and psychological factors and variations in different doctors' scores on laryngoscopy always in uence the results[8]. Objective examination is needed to clarify the existence of re ux. The pH monitoring is an objective diagnostic method that accurately re ects the changes of H + in the esophagus and/or airway. In 1969, ambulatory catheter-based esophageal pH monitoring was used to diagnose GERD[18]. In 1989, dual-probe esophageal pH monitoring was used to diagnose LPRD by Wiener et al. [19]. However, studies have shown that 45% of patients receiving dual-sensor esophageal pH monitoring have misplaced proximal sensors [20]. The Dx-pH monitoring system provides accurate probe positioning (0.5-1 cm below the uvula) and appear to be more sensitive than traditional pH monitoring in evaluation of patients with extraesophageal re ux [21].
In the analysis of 24h pH monitoring data, the Ryan Score is calculated according to the number of episodes in which pH falls below the normal range, the length of the longest episode, and the total percentage of time spent below threshold. As the pH threshold differs from the upright position and supine position, Ryan score is calculated respectively. However, the Ryan score still has some shortcomings that restrict its application. First, the Ryan score was obtained based on samples from 55 normal people [7]. The standard for normal persons was merely to exclude typical GERD symptoms such as acid re ux and heartburn, and no laryngoscopy was performed. Actually, many LPRD patients do not have gastrointestinal symptoms. Second, the pH threshold of 5.5 in the upright position and 5.0 in the supine position may be inaccurate as the pepsin is still active at pH 6.5, which can cause airway mucosal damage. In computational methods, there is no difference in the degree of discrimination between events below the threshold. Our study found that the original pH distribution data obtained by dynamic monitoring between normal people and LPRD patients are not normally distributed and that these distributions overlap [13]. However, the Ryan score obtained only from the data of normal people cannot explain the overlap of the data distribution. Although the current clinical application shows that the Ryan score can predict the e cacy of anti-re ux surgery, a negative Ryan score is still not a su cient indicator for the exclusion of LPRD patients which means a large number of patients are misdiagnosed. Hence, a more accurate diagnostic criteria of Dx-pH monitoring for LPRD is needed.
In contrast to studies in which simple statistics were used to characterize the pH variation at the pharynx, advanced statistical methods were employed to develop W score to exploit the possibility of discriminating LPRD from normal subjects. To propose the new score, machine learning methods, which used both labelled and big unlabelled data, were employed to analyze the long-term pH data and semi-supervised learning was used to alleviate the imperfectness in data and reference test by exploring the underlying data patterns. However, it still needs further clinical veri cation but it is quite challenging as the lack of golden standard to diagnose LPRD. Although there are many problems in the empirical treatment, including standard of starting treatment, poor compliance, high cost, low follow-up rate and placebo effects especially, many experts insist it is a reliable and effective method of LPR diagnosis [4,6,22,23]. We did this multi-center retrospective study by analyzing the data of 108 patients who underwent a formal course of anti-re ux therapy.
In our research, the inclusion criteria for anti-re ux therapy was not in accordance with the commonly used criteria (RSI > 13 and RFS > 7). Based on our previous clinical experience, patients with RSI<13 can also bene t from anti-re ux therapy. Therefore, we compared the effective rate between patients with RSI of 10-13 and patients with RSI >13, and the result showed no signi cant difference. In addition to RSI and RFS, we also used VAS in combination, special for suspected LPR patients with single symptom or chief complaint not included in RSI. Although acidsuppressing therapy (PPIs and/or H2-antagonists) and GI prokinetic agents were given to ensure that patients with nonacid re ux were effectively treated without omission. Unfortunately, alginates were not given to Chinese patients due to the lack of domestic clinical drug license. Anti-re ux surgery is considered to be effective for some patients with poor medication treatment, but it had not been involved in this research [24]. The follow-up period was 2-3 months, which also reduced the false negative results caused by insu cient treatment time in some patients. We found that the sensitivity of the W score was signi cantly higher than that of the Ryan score, which decreased the misdiagnosis of LPRD, and the speci city of the two scores was not signi cantly different. More LPRD patients could be screened out through the W score and bene t from antire ux treatment in clinical practice.
Compared with Ryan score, the sensitivity of W index is obviously improved, but the speci city is not high enough which needs to be improved further.

Limitations
Our study has several limitations. We regarded the e cacy of anti-re ux therapy after 2-3 months as the diagnostic criterion, nevertheless, the placebo effect could not be elicited. Patients who were ineffective for anti-re ux drugs might be effective for surgery, so underestimation of diagnosis may also exist. In addition, the W score was veri ed only by a retrospective analysis; therefore, further validation before clinical application will be needed. At last, although alkali re ux does exist in clinical practice, W score could not pick up this kind of patient as Ryan score and more work should be done to improve it.

Conclusions
In this multi-center study, we assess the performance of a new score based on machine learning for diagnosing LPRD using the Restech Dx-pH Measurement System. The W score may be potentially useful in screening LPRD patients but needs further validation before its application in clinical practice.

Declarations
Funding: This research did not receive any speci c grant from funding agencies in the public, commercial, or not-for-pro t sectors.
Notation of prior abstract publication/presentation: NO Authors' Contributions: Drs Gang Wang, Lei Wang, Zhezhe Sun had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs Gang Wang, Lei Wang, Zhezhe Sun contributed equally as rst authors to this work. Drs Wei Wu and Lianyong Li directed this work and contributed equally as corresponding authors. Con ict of Interest Disclosures: All authors declare that there are no con icts of interest. Figure 1 Flow chart describing the assessment of the possible LPRD patients.