Prognostic Predictors For COVID-19 in Daily Clinical Practice in Japan: a Propensity Score-Matched Case-Control Study

Introduction: Blood tests and computed tomography (CT) findings at diagnosis are widely used in daily clinical practice and can offer useful prognostic factors for coronavirus disease 2019. Methods: We retrospectively evaluated 66 patients who underwent a blood test and CT between January 1 and May 31, 2020, and performed a propensity score-matched case-control study. Cases and controls were a severe respiratory failure group (non-rebreather mask, nasal high-flow, positive-pressure ventilation) and a non-severe respiratory failure group, matched at a ratio of 1:3 by propensity scores constructed by age, sex, and medical history. We compared groups for maximum body temperature up to diagnosis, laboratory findings, and CT findings in the matched cohort. Two-tailed P-values <0.05 were considered statistically significant. Results: Nine cases and 27 controls were included in the matched cohort. Significant differences were seen in maximum body temperature up to diagnosis (p=0.0043), the number of shaded lobes (p=0.0434), amount of ground-glass opacity (GGO) in the total lung field (p=0.0071), amounts of GGO (p=0.0001), and consolidation (p=0.0036) in the upper lung field, and pleural effusion (p=0.0117). Conclusions: Fever and CT findings (such as GGO and consolidation) may be prognostic indicators that can be easily measured at diagnosis.


Introduction
Since the first death from coronavirus disease 2019 (COVID-19) pneumonia was reported in Wuhan, China, in December 2019 1) , the epidemic has spread to become a global pandemic, with no end yet in sight.As of May 31, 2021, a total of 169,597,415 confirmed cases and 3,530,582 deaths related to COVID-19 had been reported to the World Health Organization, with an apparent mortality rate of around 2.1% 2) .As of the same date in Japan, a total of 745,392 people had been infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Japan, with 12,926 deaths 3) .Pneumonia has been a common complication in COVID-19 4) .Although many patients show mild disease, some patients develop acute respiratory distress syndrome (ARDS) and progress to a severe within 8-14 days after onset 5) .Various radiological patterns of COVID-19 pneumonia have been reported at different times throughout the disease course 6)7)8) , which may make the progression of the disease difficult to predict.
Aggravating factors reported to date include age 9) , obesity 10) , type 2 diabetes 11) , hypertension 12)13) , chronic kidney disease 14) , and dyslipidemia 15) , but among patients of the same age, physique, and underlying illness, illness can still range from mild to severe.
In Japan, patients at risk of aggravation are supposed to be hospitalized in principle, but as the number of patients with COVID-19 increases, tools for early prediction of patients likely to experience severe illness are expected to be developed to more appropriately allocate limited medical resources.Several prognostic scores have been developed in other countries for inpatients, such as COVID-GRAM 16) and the ISARIC WHO 4C Mortality Score 17) .Studies have also shown the utility of community-acquired pneumonia severity scores such as A-DROP 18) .However, these are not yet sufficient and do not include the spread of opacities or radiological findings of pneumonia, which are speculated to be important factors in aggravated COVID-19.
We, therefore, performed a matched case-control study using propensity scores to analyze prognostic factors from among the results of physical examinations, blood tests, and CT at diagnosis in daily clinical practice.
Twenty-seven patients had received oxygen therapy or ventilation management.Two patients received nasal high-flow therapy, three had NIPPV, seven had mechanical ventilation, and two had mechanical ventilation and extracorporeal membrane oxygenation.Thirty-six patients received antiviral treatments such as favipiravir and/or remdesivir and 50 patients were treated with corticosteroids.Four patients (6.1%) died.
Distributions of baseline characteristics in patients from the severe and non-severe respiratory groups are shown in Table 1 in both unmatched and matched samples.The area under the ROC of propensity scores for severe respiratory failure was 0.83 (95% confidence interval, 0.72-0.94).
In the unmatched sample, the severe respiratory failure group was older, with higher BMI and a higher prevalence of diabetes.There was no significant difference in the period from onset to diagnosis (median [IQR] 4.00 [3.00-6.75]days in non-severe respiratory failure group vs 5.00 [4.00-7.00]days in severe respiratory failure group, p=0.448).
Although baseline characteristics seemed well balanced between severe and non-severe respiratory failure groups after propensity score matching, absolute standardized differences were not less than 0.1 in sex and history of diabetes mellitus (Table 1).
Table 2 shows the results of maximum fever up to diagnosis, blood findings, and CT findings at the first visit.CRP, LDH, uric acid, and fibrinogen were more elevated in the severe respiratory failure group.There was no significant difference in the period from the onset to the day CT was performed (median [IQR] 4.50 [3.00-6.75]days in non-severe respiratory failure group vs 5.00 [4.00-6.25]days in severe respiratory failure group, p=0.671).GGO was more extensive than consolidation in both non-severe and severe respiratory failure groups.
Distributions of both GGO and consolidation were increased toward the bottom of the lungs.Opacities were almost always evident bilaterally.In the severe respiratory failure group, shadows were more frequently multilobed.Mediastinal lymphadenopathy was observed in 34 cases (51.5%) and tended to be more common in the severe respiratory failure group.Two patients classed as serious cases showed pleural effusion.Pleural thickening was observed in 11 cases (16.7%) and tended to be more common in the non-severe respiratory failure group.Twenty-one cases (31.8%) showed subpleural curvilinear lines/stripe shadows.Vascular enlargement was observed in two cases (3.0%).Traction bronchiectasis was observed in one severe case.Among matched patients, significant differences in maximum body temperature up to diagnosis, number of shaded lung lobes, amount of GGO in the total lung field, amount of GGO in the upper lung field, amount of consolidation in the upper lung field, and presence of pleural effusion differed significantly.The property of severe respiratory failure was significantly associated with increases in these factors except pleural effusion on multivariate linear regression after adjusting the factors used for the construction of propensity scores in the matched cohort (Table 3).The optimal cutoff for GGO in the total lung field on logistic regression probabilities was 14.6% (Figure 2).

Discussion
In our study using propensity score matching to reduce bias due to confounding factors by matching patients for baseline variables using multivariable logistic regression modeling, significant differences in maximum body temperature were seen up to diagnosis, the number of shaded lung lobes, amount of GGO in the total lung field, amount of GGO in the upper lung field, amount of consolidation in the upper lung field, and presence of pleural effusion.Blood findings did not differ significantly, and fever and radiological findings were more prognostic at diagnosis.COVID-19 is considered a disease for which the prognosis changes significantly depending on age and underlying disease, and matching patient background characteristics as confounding factors and extracting prognosis-related factors from physical and laboratory findings is very important.
Regarding factors that may be confounders for prognosis in the patient background, Wu et al. 9) found that older patients and those with more comorbidities were at greater risk of developing ARDS among patients infected with COVID-19.Patients with COVID 19 with diabetes are known to progress to severe illness l1) .
Liang et al. 19) found that patients with COVID-19 and cancer showed a higher risk of severe events compared with patients without cancer.A dose-response meta-analysis by Pranata et al. 10) showed that increased BMI was associated with an increased frequency of poor outcomes among patients with COVID-19.Other prognostic factors included hypertension 12)13) , dyslipidemia 14) , chronic kidney disease 15) , and chronic obstructive pulmonary disease 20) .In this study, significant differences were seen in age, BMI, and diabetes, as in previous reports, but not in hypertension, dyslipidemia, or chronic renal failure.Regarding the prevalence of cancer, none of the patients in this study had cancer and could not be evaluated.
In terms of radiographic findings, the number of shaded lung lobes, amount of GGO in the total lung field, amount of GGO in the upper lung field, amount of consolidation in the upper lung field, and presence of pleural effusion differed significantly after propensity score matching.
In the early stages of COVID-19, lesions usually manifest as localized GGO, subpleural bands, vascular enlargement, and peripheral distribution on CT.This is because the virus invades and replicates in the alveolar epithelium and exudates leak mainly into the alveolar space with distributions mainly under the pleura or around the peribronchovascular regions 21)22) .With disease progresses, the range of involved alveoli and mucosa increases, and bronchial walls swell, contributing to patterns of air bronchograms with consolidation and bronchial wall thickening.Patterns of crazy-paving, interlobular septal thickening, and reticulation basically reflect the involvement of pulmonary interstitium, such as interlobular interstitial edema 23) .
In most patients, the lower lobes are involved more frequently than the upper and middle lobes.A peripheral distribution, multilobar involvement, and posterior involvement are other important characteristics of lesions distribution 24) .
According to a report by Ye et al. 4) , pleural effusion was observed in about 5% and pleural thickening in about 30%, subpleural curvilinear lines in about 20% and lymphadenopathy in 4-8%.Subpleural curvilinear lines are reportedly related to pulmonary edema or fibrosis 4) .Stripe shadows were thought to be related to the healing of pulmonary chronic inflammation or proliferative disease 4) , with gradual replacement of cellular components by scar tissue.Vascular enlargement is described as the dilation of pulmonary vessels around and within the lesions, which might be attributed to damage to and swelling of the capillary wall caused by pro-inflammatory factors 4) .Song et al. 25) found more consolidative lesions in patients with a longer interval between symptom onset or with age over 50 years and suggested that this manifestation may warrant greater attention in the management of patients.In a meta-analysis of chest features on CT by Zheng et al. 24) , compared to findings in common patients, some CT manifestations were more frequent in severe patients, such as traction bronchiectasis, interlobular septal thickening, consolidation, crazy-paving pattern, reticulation, pleural effusion, and lymphadenopathy.
However, these findings are not seen in the early stages of the disease and are not considered predictors of prognosis in the early stage.In addition, relatively few reports have provided details of radiological findings in the early stages of the disease.
In our study, significant differences were observed in the amount of GGO, the amount of GGO or consolidation at each site, or the presence of pleural effusion.Traction bronchiectasis, as a finding in the advanced phase, was rarely observed.No significant differences in lymphadenopathy were identified.This was probably because cases in this study were rarely swollen, with a short axis diameter of 1 cm or more.
Based on the present results, early predicators of prognosis could be widespread GGO, widespread GGO extending to the upper lung field, early presence of consolidation in atypical sites that is more frequently seen in the late phase, and pleural effusion that is usually seen in the late phase.
In this study, maximum body temperature up to diagnosis was a predictor of prognosis.Fever was considered a predictor of mortality by Iftime et al. 26) and the same result was obtained in this study.
Zhou et al. 27) reported that elevated white blood cell count, decreased lymphocyte count, thrombocytopenia, elevated lactate dehydrogenase, elevated creatinine, elevated D-dimer, elevated ferritin, elevated high-sensitivity cardiac troponin I, and elevated interleukin (IL)-6 were seen in patients who died from COVID-19.
Ayanian et al. 28) reported that monitoring IL-6, D-dimer, CRP, LDH, and ferritin was clinically useful in this respect, particularly when these markers were above the mentioned cutoff values.d'Alessandro et al. 29) reported KL-6 measurement as potentially useful for evaluating the prognosis of COVID-19 patients.Sugiyama et al. 30) reported that elevated IFN-λ3, elevated IP-10, elevated C-X-C motif Chemokine ligand (CXCL9), and low Thymus and activation-regulated chemokine (TARC) could be promising prognostic markers to distinguish between mild/moderate and severe/critical patients.
In this study, no significant difference in blood test findings were confirmed at the first visit in daily clinical practice, indicating the difficulty of predicting disease severity at the first visit and also revealing fever and radiological findings as more important factors than blood test.
The prognosis prediction score for COVID-19, which shows few subjective symptoms in the early stage and potentially progresses, seems to depend on severe symptoms and hypoxemia, and may be unsuitable for predicting prognosis at an earlier stage.In addition, that score does not include CT findings, and addition of factors for the distribution of GGO and consolidation at an early stage would be better, based on the results of this study.The quantification of GGO and consolidation can generally be measured by installing the software.
In addition, referring to the fact that the cutoff for the ratio of GGO to the entire lung field in all cases for severe and non-severe cases was 14.6%, for convenience, GGO that occupies one-sixth of the total lung field on chest X-ray, the presence of GGO in the upper lung field and the presence of consolidation in the upper lung field may be used as indexes.
Due to the urgency of the current global situation, trials that provide real-world evidence are more important than randomized controlled trials, which usually take months to complete and are too long in the midst of a pandemic.Quite key is real-world evidence to answer the question of drug effectiveness.Various medicines have been tried against COVID-19, and we plan to examine the efficacy, optimal dose, and optimal duration of administering drugs such as steroids that are currently in wide use by performing propensity score matching using the predictors in this study.
As limitations to this study, first, our hospital mainly treated patients with mild to moderate illness, and selection biases may thus have impact the enrollment of patients.This was also related to the slightly lower frequency of comorbidities.Second, although we matched patient characteristics using propensity scores and by balancing variables, unmeasured confounders were not balanced between cases and controls in the present study.Third, the present study was a single-center experience, and the generalizability of the results obtained in the study was not sufficient.A multicenter study is required to confirm these results.

Conclusion
Maximum body temperature up to diagnosis and radiological findings (number of shaded lung lobes, amount of GGO in the total lung field, amount of GGO and consolidation of the upper lung field, and presence of pleural effusion) could offer useful prognostic predictors in radiographic and blood test findings at the diagnosis of COVID-19 in daily clinical practice, rather than blood test findings.

Materials and Methods
This study was approved by the ethics committee of Tokyo Shinagawa Hospital (approval number 20-A-06) and was performed in accordance with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.The requirement to obtain informed consent for participation was waived due to the retrospective nature of this study.We retrospectively enrolled 66 patients with a final diagnosis of COVID-19 pneumonia who had undergone blood testing and chest CT from the early to middle phase (from onset to 10 days; if CT was performed multiple times during this period, only the CT performed earliest within the target period was analyzed) in our hospital between January 1 and May 31, 2020.COVID-19 was confirmed by reverse transcription-polymerase chain reaction (RT-PCR) or loop-mediated isothermal amplification.Based on the previous reports 6) 7) , we quantified ground-glass opacity (GGO) and consolidation, as the main components of COVID-19 pneumonia, and analyzed shadow spread (uni-or bilateral, how many lobes showing shadows), mediastinal lymphadenopathy, traction bronchiectasis, stripe shadows/subpleural curvilinear lines, pleural thickening, vascular enlargement, pleural effusion, and pericardial effusion.
In terms of laboratory findings, we analyzed white blood cell counts, lymphocyte counts, platelet counts, lactate dehydrogenase (LDH) isozymes, uric acid, C-reactive protein (CRP), fibrinogen, and D-dimer measured at diagnosis.For physical examination, fever that could be investigated from past medical records was analyzed.

Protocols for chest CT
All images were obtained on one of two CT systems (Revolution Maxima; GE Healthcare, Chicago, IL, USA or Aquilion ONE; Canon, Tokyo, Japan) with the patient supine.The main scanning parameters were as follows: tube voltage, 120 kVp; automatic tube current modulation, 40-140 mAs; pitch, 0.828-1.375mm; matrix, 512 ´ 512; slice thickness, 2.5-3.0 mm; and field of view, 400 ´ 400 mm.All images were then reconstructed with a slice thickness of 0.5-1.25 mm and 1-mm increments.

Image interpretation
Images were analyzed by three senior fellows of the Japanese Respiratory Society.Evaluators independently and freely assessed CT features using axial CT, multiplanar reconstruction images, and high-resolution CT.
After separate evaluations, any disagreements were resolved by discussion as a consensus decision of all three senior fellows of the Japanese Respiratory Society.

Calculation of volumes of GGOs and consolidation
We used a tool newly developed by Fujitsu Limited (Tokyo, Japan) to annotate shadows manually and automatically calculated the volume based on annotation results (Fig. 1).Specifically, we selected 30 images including lung fields from one series of CT images for each case at as close to equal intervals as possible to calculate volumes of GGO and consolidation.All CT images were monochrome, 512 pixels ´ 512 pixels on the axial view and including both lungs.The following processing was performed on each selected image: 1. Division of image into 1024 (32 ´ 32) patches, each as a 16 pixel ´16 pixel square.
2. Confirmation of CT images by senior fellows of the Japanese Respiratory Society and identification of each patch as "Consolidation due to COVID-19 pneumonia", "GGO due to COVID-19 pneumonia", "Other lung", or "Extrapulmonary field".CT images are rendered into 3 dimensions by linear interpolation between the selected 30 images, and patches were also rendered into 3 dimensions.Volumes of patches classified as "Consolidation due to COVID-19 pneumonia", "GGO due to COVID-19 pneumonia" or "Other lung" were obtained.

Statistical analysis
We selected a propensity score-matched case-control study design to evaluate prognostic predictors in radiographic and blood test findings at diagnosis in COVID-19.We defined a severe respiratory failure group (severe respiratory failure group: non-rebreather mask, nasal high-flow, noninvasive positive-pressure ventilation (NIPPV) or invasive positive-pressure ventilation) as cases and a non-severe respiratory failure group (non-severe respiratory failure group: nasal cannula or mask or no oxygen administration) as controls.A multivariable logistic regression model was used to construct case propensity scores, and confounding factors of patient characteristics that impacted on patient prognosis and were examined based on findings from the previous studies [9][10][11][12][13][14][15] were included for the construction of propensity scores: 1) sex; 2) age; 3) body mass index (BMI); 4) presence of hypertension; 5) presence of diabetes; 6) presence of dyslipidemia, and 7) presence of chronic renal failure.Age and BMI were used as the binary data (age over 60 or not; BMI over 25 or not) for the construction of propensity scores.
Propensity score-matched groups were created at a ratio of 1:3 based on the nearest neighbor matching algorithm with a 0.2-caliper distance of the standard deviation of the logit of the propensity score with replacements.Discrimination (i.e., the capability to classify individuals with and without events) was evaluated by the C-statistic or the area under the receiving operating characteristic curve.We compared the severe and non-severe respiratory failure groups for maximum body temperature up to diagnosis, laboratory findings, CT findings (uni-or bilateral, number of lobes with shadows, the volume ratio of GGO to whole lung field, the volume ratio of consolidation to whole lung field, volume ratios of GGO to each of the upper, middle and lower lung fields, volume ratios of consolidation to teach of the upper, middle and lower lung fields, mediastinal lymphadenopathy (including those with short-axis diameter <1 cm), traction bronchiectasis, fibrous stripes/subpleural curvilinear lines, pleural thickening, vascular enlargement, pleural effusion, and pericardial effusion.
Continuous and categorical variables of patient characteristics were compared using Student's t-test and the c 2 test within each set.Ratios of GGO consolidation to total/upper/middle/lower lung field were compared using the Mann-Whitney U test.Multivariate linear regression was also used to evaluate whether significantly different factors were associated with severe/non-severe respiratory failure due to insufficient absolute standardized differences in the patient characteristics used for constructing propensity scores between propensity score-matched cases and controls and to confirm double robustness by the propensity-score matched study and multivariate linear regression analyses.
Moreover, we used likelihoods to plot the receiver operating characteristic (ROC) curve and identified optimal cut-offs for predicted probability by maximizing the Youden index.Two-tailed P-values <0.05 were considered significant.Data were analyzed using STATA ® version 16 (StataCorp, College Station, TX, USA).All files of extracted data were encoded to anonymize patient data and prevent personal identification.

Figure legends
Figure 1 The annotation tool developed by Fujitsu Limited to annotate shadows manually for automatic calculation of volume.

Figure 2
ROC curve for GGO/total lung field and sensitivity/specificity as a function of probability cut-off.The optimal cutoff on logistic regression probabilities was the intersection point reflecting the optimal balance between sensitivity and specificity (GGOs/total lung field 14.6%).

Table 1
Baseline characteristics in unmatched and matched patientsNote.A two-tailed p-value < 0.05 was considered to indicate statistical significance.

Table 2
Outcomes of comparison between non-severe and severe respiratory failure groups in unmatched and matched patients Note.A two-tailed p-value < 0.05 was considered to indicate statistical significance.

Table 3 .
Association with maximum body temperature until diagnosis, CT findings (lobe, ground glass opacityto-total lung field ratio, ground glass opacity-to-upper lung field ratio, and consolidation-to-upper lung field ratio) on multivariate linear regressions in the matched cohort.Adjusted for patient characteristics used for the construction of propensity scores (age, sex, BMI, diabetes, hypertension, dyslipidemia, and chronic renal failure).A two-tailed P-value < 0.05 was considered to indicate statistical significance.