Do Serial Lung Ultrasound Scores Predict Prolonged Mechanical Ventilation in Patients With Severe COVID-19? A Single-centre Retrospective Cohort Study

Background: Some patients with severe coronavirus disease (COVID-19) who present with brosis on computed tomography (CT) require prolonged mechanical ventilation (PMV). Lung ultrasound (LUS), a rapid, bedside test, has been reported to have ndings consistent with those of CT. Thus, this study aimed to assess whether serial LUS scores could predict PMV or successful extubation in severe COVID-19 patients. Methods: LUS was performed for 20 consecutive severe COVID-19 patients at three time points: admission (day 1), after 48 h (day 3), and seventh-day follow-up (day 7). We compared the LUS score with the results of chest X-rays and laboratory tests at three time points. Moreover, we assessed LUS score to determine the inter-rater reliability (IRR) of the results among examiners. Results: While there were no signicant differences in mortality in each PMV and successful extubation groups, there were signicant differences in LUS scores on day 3 and day 7; XP score on day 7; and P/F ratio on day 7 in the PMV group (p<0.05). There were signicant differences in LUS scores on days 3 and 7, C-reactive protein (CRP) levels on day 7, and P/F ratio on day 7 in the successful extubation group (p<0.05). The areas under the curves (AUCs) of LUS score on days 3 and 7, XP score on day 7, and P/F ratio were 0.88, 0.98, 0.77, and 0.80, respectively in the PMV group; and the AUCs of LUS score on days 3 and 7, CRP levels on day 7, and P/F ratio 0.79, 0.90, 0.82, and 0.79, respectively, in the successful extubation group. Variations in serial LUS scores exhibited signicant differences between the groups. The serial LUS score on day 7 was higher than that on day 1 in the PMV group but lower in the successful extubation group (p<0.05). However, there was slight IRR agreement in the LUS score changes on days 1 to 7 (κ= 0-0.31]). B-lines/consolidation B-lines/consolidations quantitatively scored as follows: score 0, well-spaced B-lines < 3; score 1, well-spaced B-lines ≥ 3; score 2, multiple coalescent B-lines; and score 3, lung consolidation. The pleural line was quantitatively scored as follows: score 0, normal; score 1, irregular pleural line; and score 2, blurred pleural line. 4–10) days. There was no signicant difference in age, gender, body mass index, time from rst symptom, pre-existing condition, Acute Physiologic Assessment and Chronic Health Evaluation II score, treatment, complications, outcome, and uid balance. In imaging and laboratory ndings, there were signicant differences in the LUS score on days 3 and 7; XP score on day 7; and P/F ratio on day 7 (p < 0.05). There was no difference with brosis on CT at admission.

Conclusions: The serial LUS score of severe COVID-19 patients could predicted PMV and successful extubation.
To overcome IRR disagreement, the automatic ultrasound judgement, such as deep learning, would be needed.

Background
Patients with prolonged mechanical ventilation (PMV) have a higher mortality rate and bear higher costs than those who do not require PMV [1]. The average duration of invasive mechanical ventilation in patients with coronavirus disease  admitted to the intensive care unit (ICU) was reported to be approximately 8.4 (95% con dence interval [CI] 1.6-13.7) days; however, in some patients, the use is prolonged [2,3]. If patients receive PMV, they are usually excluded as candidates for extra-corporeal membranous oxygenation (ECMO), and with limited resources during a pandemic, it may be considered a withdrawal of treatment [4,5]. The rapid surge of medical needs depletes ventilators and ICU beds, making the use of anaesthetic machines instead of ventilators compulsory [6]. Therefore, it is very important to predict whether patients will require PMV or can be extubated if ventilatory management becomes necessary.
It has been reported that patients with COVID-19 who require PMV present with brosis on computed tomography (CT) [7,8]. Although CT is useful for assessing lung severity, it requires the transportation of critically ill, invasively ventilated patients to the radiology facilities, and this process is challenging [9,10]. Lung ultrasound (LUS) is a rapid, bedside, goal-oriented, diagnostic test that is used to answer speci c clinical questions, and its ndings have been reported to be consistent with CT ndings [11,12]. Moreover, LUS can be quantitatively assessed, and the serial LUS score can be used to assess disease progression [13,14]. However, there is a concern regarding whether the ultrasound ndings can be reliably interpreted by examiners [15]. Therefore, in this study, we evaluated whether the serial LUS score could predict PMV and successful extubation in severe COVID-19 patients who require invasive mechanical ventilation and determined the inter-rater reliability (IRR) of the results among examiners.

Study design and population
This retrospective, single-centre, observational study included consecutive patients from Yokohama City University Medical Centre Advanced Critical Care and Emergency Centre, Japan, a hospital designated for treating severe COVID-19 patients, from 1 May 2020 to 28 February 2021. The following patients were included: those with a positive nasopharyngeal reverse transcription polymerase chain reaction for severe acute respiratory syndrome coronavirus 2; those aged > 18 years; and those who required mechanical ventilation over 48 h. The exclusion criteria were as follows: acute heart failure, interstitial pneumonia, other pulmonary diseases affecting image acquisition or suboptimal ultrasound window, missing ultrasound data, and patients' refusal to consent.
The study was approved by the institutional ethics board of Yokohama City University Medical Centre (approval number: B200200049). Written informed consent was waived, as ultrasound scanning of the lungs is considered a routine procedure.

Patient management
During invasive mechanical ventilation, sedation analgesia was managed at a Richmond Agitation Sedation Score under − 3 in patients with a strong respiratory effort, and muscle relaxant was also administered if necessary. The patient's respiratory effort was assessed based on the airway occlusion pressure (P0.1) and physical examination, and a P0.1 > 4 was considered a strong respiratory effort [16]. If the respiratory effort was calm, for example, a P0.1 ≤ 4 and not using accessory respiratory muscles, daily spontaneous awaking trial (SAT) was performed, and the patient was managed according to the Pain, Agitation/Sedation, Delirium, Immobility, and Sleep Disruption guidelines [17].
Ventilation management was performed with pressure-controlled ventilation driving pressure < 14 and positive end-expiratory pressure (PEEP), based on a high PEEP table, from the acute respiratory distress syndrome (ARDS) net of respiratory frequency < 15 [18]. FiO2 was adjusted to SpO2 > 93%. The introduction criteria for prone ventilation according to the PROSEVA study were as follows: FiO2 > 60% and PaO2/ FiO2 (P/F) ratio < 150 [19].
Prone ventilation was performed for at least 16 h. The introduction criteria for veno-venous ECMO according to the EOLIA trial were as follows: FiO2 > 80% and P/F ratio < 80 for at least 6 h [20] To evaluate extubation, spontaneous breathing trial was performed after SAT, and if the Rapid Shallow Breath Index was < 100, the patient was extubated [21]. Fluid management was attained by keeping the level of increase within 10% of the body weight. The nal decision to extubate was made by a team that included the physician in charge. Tracheostomy was performed after 2 weeks of intubation owing to the risk of infection and according to the guideline [22].

Clinical data and outcomes
Page 4/20 Data on patients' demographic characteristics, imaging and laboratory ndings, comorbidities, complications, treatment for COVID-19, and outcomes were extracted from the electronic medical records. Laboratory tests and chest X-ray results were recorded every day after admission, for the assessment of COVID-19 progression and brosis. Similarly, LUS was performed daily, when possible (depending on the workload and available medical staff). CT was usually performed at the time of admission and thereafter, when the physician in charge deemed it necessary. The primary endpoint of the study was PMV, and the secondary endpoint was successful extubation. PMV was de ned as the requirement of mechanical ventilation for > 21 days, including reintubation within 7 days, according to the National Association for Medical Direction of Respiratory Care Consensus Conference [1]. Successful extubation was de ned as not requiring reintubation for > 7 days.
Performing LUS and chest X-ray scoring LUS examinations were performed using an ultrasound equipment (GE Venue Go) with a 5-12-MHz linear transducer. LUS was performed at six points per hemithorax (superior and inferior regions anteriorly, laterally, and posteriorly), and bilaterally; a total of 12 regions were assessed with the probe placed at the intercostal space to obtain images widely. In each region, LUS signs, including B-lines/consolidation and pleural line abnormalities, were assessed, and the worst LUS signs were recorded according to a previous study [23].
B-lines/consolidations were quantitatively scored as follows: score 0, well-spaced B-lines < 3; score 1, well-spaced B-lines ≥ 3; score 2, multiple coalescent B-lines; and score 3, lung consolidation. The pleural line was quantitatively scored as follows: score 0, normal; score 1, irregular pleural line; and score 2, blurred pleural line. The sum of both scores in all 12 zones yielded a nal score (ranging from 0 to 60) that was de ned as the LUS score.
The LUS score was evaluated by two emergency physicians who were experienced in performing LUS for over 25 cases and who were blinded to the clinical data [24]. The scores were evaluated independently, after which the nal decisions were reached by consensus. Moreover, the LUS score was evaluated by ve emergency physicians who were less experienced in performing LUS to assess IRR.
The radiographic assessment of the lung oedema (RALE) score was used to evaluate the chest X-ray [25]. To determine the RALE score, each radiograph was divided into quadrants, de ned vertically by the vertebral column and horizontally by the rst branch of the left main bronchus. Each quadrant was assigned a consolidation score of 0-4 to quantify the extent of the alveolar opacities based on the percentage of the quadrant with the opaci cation and a density score of 1-3, to quantify the overall density of the alveolar opacities, unless the consolidation score for that quadrant was 0. The density score (1 = hazy, 2 = moderate, and 3 = dense) allowed for more quantitative assessment of the density of opaci cation by quadrant. To calculate the nal RALE score, the product of the consolidation and density score for each quadrant were summed for the nal RALE score, ranging from 0 (no in ltrates) to 48 (dense consolidation in > 75% of each quadrant).
The RALE score was also evaluated by two experienced emergency physicians who were blinded to the clinical data. Scores were evaluated independently, after which the nal decisions were reached by consensus.

IRR of the LUS score
The LUS examinations were anonymised before scoring. The raters were blinded to the clinical information and the assessments were performed by other raters. We assessed LUS score and the difference in the daily LUS score changes (day 1 to 3 and day 1 to 7) between well-experienced and less-experienced emergency physicians.
The changes in LUS scores were de ned as follows: 0, no change from day 1 LUS score; 1, decreasing from day 1 LUS score; and 2, increasing from day 1 LUS score.

Statistical analysis
Continuous variables are expressed as the mean ± standard deviation (SD) or median (interquartile range), as appropriate. Categorical variables are presented as frequencies (percentages). Analysis of variance (ANOVA) of continuous variables were evaluated using the Kruskal-Wallis test or two-way ANOVA owing to non-normally distributed data. Categorical variables were compared using the chi-square test or Fisher's exact test. To estimate the predictors of the outcomes, all potential predictors of the outcomes were included in the univariate analyses (Mann-Whitney U test). Variables with P < 0.05 in the univariate analysis were used in the receiver operator curve (ROC) analysis. The ROC analysis was performed to examine the sensitivity and speci city of prognosis parameters of the outcomes and determine the area under the curve (AUC) with the 95% con dence interval (CI).
The best cut-off value, which was determined by the maximum Youden index, for predicting future events, was also determined.
The IRR of the LUS score was analysed using the kappa test [26]. The kappa value was interpreted using a scale interpretation by Altman's Benchmark Scale for the kappa [27]. The acceptable limit for the kappa value in this study was ≥ 0.41, which was in moderate agreement.

Clinical characteristics
A total of 20 COVID-19 patients who met the inclusion criteria were identi ed. Six patients were excluded owing to missing ultrasound data. Table 1 summarises the baseline clinical characteristics of the patients with (n = 11) or without (n = 9) PMV. The patients comprised 16 men and four women, and their median age was 66 years (range, 56.3-73.8 years). The median duration between the occurrence of symptoms and hospital admission was 7 (range, 4-10) days. There was no signi cant difference in age, gender, body mass index, time from rst symptom, pre-existing condition, Acute Physiologic Assessment and Chronic Health Evaluation II score, treatment, complications, outcome, and uid balance. In imaging and laboratory ndings, there were signi cant differences in the LUS score on days 3 and 7; XP score on day 7; and P/F ratio on day 7 (p < 0.05). There was no difference with brosis on CT at admission.

ROC analysis
We conducted ROC curve analysis to assess the predictive values of LUS scores for days 3 and 7; XP score on day 7; P/F ratio on day 7 for PMV and LUS scores for days 3 and 7; CRP levels on day 7; P/F ratio day 7 for successful extubation.
Analysis of the serial LUS score Figure 2a shows the variations in LUS scores on days 1, 3, and 7 between the PMV and non-PMV groups. The serial LUS scores exhibited signi cant differences between the PMV and non-PMV groups (p < 0.001, two-way ANOVA). The serial LUS score on day 7 was signi cantly higher than that on day 1 in the PMV group (p < 0.05, Kruskal-Wallis test) but was signi cantly lower in the non-PMV group (p < 0.05, Kruskal-Wallis test). Moreover, the variations in LUS scores on days 1, 3, and 7 between the successful and non-successful extubation groups exhibited signi cant differences (p = 0.001, two-way ANOVA; Fig. 2b). The serial LUS score on day 7 was signi cantly lower than that of day 1 in the successful extubation group (p < 0.05, Kruskal-Wallis test) but was signi cantly higher in the non-successful extubation group (p < 0.05, Kruskal-Wallis test). We showed examples of PMV and successful extubation cases that were monitored using ultrasound and CT (Fig. 3) and found that higher LUS scores on day 7 predicted PMV, while lower LUS scores on day 7 predicted successful extubation in patients with severe COVID-19.

IRR
There was no IRR agreement in the LUS score between well-experienced and less-experienced emergency physician examiners (κ = 0.01 [95% CI: 0-0.04]). There was fair IRR agreement in the LUS score changes on day 1

Discussion
Several studies on the use of LUS in COVID-19 patients, which use CT as the reference standard, have indicated that LUS on admission may predict mortality or the need for invasive mechanical ventilation [13,14,23].
However, few studies that assessed the serial LUS scores could predict the prognosis of lung injury. In this study, we showed that higher LUS score on day 7 was a predictor for PMV, while lower LUS score on day 7 was a predictor for successful extubation in patients with severe COVID-19.
If patients receive PMV, they are usually excluded as candidates for ECMO, and with limited resources during a pandemic, this may be considered a withdrawal of treatment [4,5]. The rapid surge of medical needs depletes the ventilators and ICU beds, making the use of anaesthetic machines instead of ventilators compulsory [6].
Therefore, it is very important to predict whether patients will require PMV or can be extubated if ventilatory management becomes necessary. If we can predict the need for PMV early, we can consider transferring the patient to an ECMO centre before ECMO is no longer applicable. Furthermore, the ability to predict PMV allows for the appropriate allocation of medical resources, including ICU beds.
Gattinoni et al. reported variations in the respiratory mechanics pro les of invasively ventilated patients with COVID-19 pneumonitis, and the following two clinical phenotypes were identi ed: (1) type L, which is characterised by low elastance, a low ventilation-to-perfusion ratio, a low lung weight, and a low recruitability, and (2) type H, which is characterised by high elastance, a pronounced right-to-left shunt, a high lung weight, and a high recruitability [28]. The transition from Type L to Type H may be because of the worsening of COVID-19 severity, or an injury caused by high-stress ventilation and patient self-in icted ventilation (P-SILI) [29,30]. The depth of the negative intrathoracic pressure may also play a key role in the phenotype shift. If P-SILI is a concern in COVID-19 patients, early intubation is recommended, and adequate sedation and analgesia should be administered to suppress spontaneous breathing [30,31]. However, the patient's condition should be evaluated to determine how long the lungs should be rested and when the lungs should be used. Excessive sedation and analgesia may result in unsuccessful extubation, which is a risk factor for PMV [32].
Follow-up CT in ARDS patients, including COVID patients, could demonstrate the progression of lung pathology [9][10][11]33]. Pulmonary broproliferation, assessed using CT, in patients with ARDS, which is induced by COVID-19, predicts increased mortality and increased susceptibility to multiple organ failure, including ventilator dependency and its associated outcomes [8, 9,34]. However, in a pandemic, the transportation of critically ill ventilated patients to radiology facilities is challenging, especially for ECMO-managed patients [11,12]. LUS is a fast, non-invasive, sensitive, and quantitative tool to assess multiple pulmonary pathologies, such as pulmonary oedema, pneumonia, and interstitial lung disease [35]. Recently, it has been shown that LUS ndings are similar to chest CT ndings, and they are superior to chest radiography ndings for evaluating patients with COVID-19 [36,37]. Indeed, we reported the usefulness of LUS as the sole imaging modality with bedside accessibility to patients for the timely identi cation of pulmonary condition, thus reducing the risk of moving unstable ECMO patients [14]. In this study, there was no difference in water balance or cardiac function according to the outcome. Therefore, we believe that the worsening of LUS scores can be used to evaluate lung injury, such as brosis. Although sialylated carbohydrate antigen KL-6 (KL-6) is usually used as a biomarker to evaluate lung brosis and can predict severity in COVID-19 patients [39], there was no signi cant difference between PMV and non-PMV in our study. Compared with a previous report [39], our study included only severely ill patients, and KL-6 was indeed high, suggesting the severity of the disease but not prognosis.
There are few reports of COVID-19 patients who met the usual extubation criteria but were subsequently reintubated [13,40]. Moreover, CT at the time of reintubation shows progressive brosis of the lung [13,40]. In our study, three patients in the PMV group met our extubation criteria and were once extubated but were reintubated within 7 days. The reason for the reintubation could be exacerbation of the respiratory workload owing to the lung brosis. It has been reported that the success rate of extubation is higher when respiratory effort and diaphragmatic muscle strength are added to the evaluation, besides the conventional extubation criteria [41,42].
Based on our results and previous reports of ultrasound evaluation of the diaphragm [41], we believe that ultrasound assessment may be considered in future extubation criteria.
Another nding of our study was that although LUS was considered useful, the IRR agreement was low. Previous studies have reported that high IRR of B-line is suggestive of pulmonary oedema, while low IRR is suggestive of pleural thickness and abnormalities observed in ARDS, including COVID-19 and lung brosis [15,43]. The results of our study are similar with the results of previous studies [15]. Although LUS is a rapid, bedside, goal-oriented, diagnostic test, the IRR variation is a common problem in point of care ultrasound. To overcome this problem, we believe it is very important to perform a study on the automatic ultrasound judgement such as deep learning [44,45].
This study had some limitations. This was a single-centre study with a relatively limited sample size; this could limit the generalisability of our results. Therefore, further multicentre studies with a larger sample size are needed to assess our ndings. Secondly, it is suggested that respiratory muscle strength, including diaphragmatic functions, affect PMV and successful extubation; however, this was not assessed in this study. Finally, a daily comparison between the LUS score and chest CT was not performed because we had extremely limited CT imaging data (almost only available on admission).

Conclusions
In this study, we showed that higher LUS score on day 7 was a predictor for PMV, while lower LUS score on day 7 was a predictor for successful extubation in patients with severe COVID-19. However, although LUS was useful,  Figure 1 Receiver operator curve analysis for predicting PMV and successful extubation PMV, prolonged mechanical ventilation; AUC, area under the curve; CI, con dence interval; LUS, lung ultrasound; XP score, X-ray score; P/F ratio, PaO2/FiO2 ratio; CRP, C-reactive protein Analysis of variance of the LUS score PMV, prolonged mechanical ventilation; LUS, lung ultrasound