Discriminant models for the prediction of viral shedding time and disease progression in COVID-19

Summary: this study fully characterize the clinical features associated with postponed viral shedding and disease progression, then develop and validate two prognostic models with satisfactory discriminant performance. Abstract Background: COVID-19 infection can cause life-threatening respiratory disease. This study aimed to fully characterize the clinical features associated with postponed viral shedding and disease progression, then develop and validate two prognostic discriminant models. Methods: This study included 125 hospitalized patients with COVID-19. 44 parameters were recorded, including age, gender, underlying comorbidities, epidemic features, laboratory indexes, imaging characteristics and therapeutic regimen, et al. F-test and  2 test were used for feature selection. All models were developed with 4-fold cross-validation, and the final performances of each model were compared by the Area Under Receiving Operating Curve (AUROC). After optimizing the parameters via L 2 regularization, prognostic discriminant models were built to predict postponed viral shedding and disease progression of COVID-19 infection. The test set was then used to detect the predictive values via assessing models sensitivity and specificity. Results: 69 patients had a postponed viral shedding time (>14 days), and 28 of 125 patients progressed into severe cases. Eleven and six demographic, clinical features and therapeutic regimen were significantly associated with postponed viral shedding and disease progressing, respectively ( p <0.05). The optimal discriminant models are: y 1 (postponed viral shedding) = -0.244 + 0.2829x 1 (the interval from the onset of symptoms to antiviral treatment) + 0.2306x 4 (age) + 0.234x 28 (Urea) - 0.2847x 34 (Dual-antiviral therapy) + 0.3084x 38 (Treatment with antibiotics) + 0.3025x 21 (Treatment with y 2 (disease progression) = -0.348 - 0.099x 2 (interval from Jan 1st, 2020 to individualized onset of symptoms) + 0.0945x 4 (age) + 0.1176x 5 (imaging characteristics) + 0.0398x 8 (short- term exposure to Wuhan) - 0.1646x 19 (lymphocyte counts) + 0.0914x 20 (neutrophil counts) + 0.1254x 21 (neutrphil/lymphocyte ratio) + 0.1397x 22 (C-Reactive Protein)+ 0.0814x 23 (Procalcitonin) + 0.1294x 24 (Lactic dehydrogenase) + 0.1099x 29 (Creatine kinase) . The output ≥ 0 predicted postponed viral shedding or disease progressing to severe/critical state. These two models yielded the maximum AUROC, and faired best in terms of prognostic performance (sensitivity of 73.3%, 75%, and specificity of 78.6%, 75% for prediction of postponed viral shedding and disease severity, respectively). Conclusion: The two discriminant models could effectively predict the postponed viral shedding and disease severity, and be used as early-warning tools for COVID-19.


Introduction
The prevalence of coronavirus disease 2019 (COVID- 19) has put a huge burden to medical resources [1]. Although patients with COVID-19 infection mostly manifested as non-severe cases, it can also cause life-threatening conditions before or during hospitalization, such as severe pneumonia, adult respiratory distress syndrome or multiple organ failure, which are all related to worse outcomes [2]. Compare to the other epidemic disease, such as the previous outbreaks of SARS-CoV and MERS-CoV,  progresses and spreads more rapidly, with peculiar epidemiological traits. High viral loads of SARS-CoV-2 were observed in the upper respiratory specimens of patients with little or no symptoms, this indicated that inapparent-transmission plays a major but underestimated role in sustaining the outbreak of COVID-19 [3].
Thus, in order to optimize therapeutic strategies and to effectively control the transmission resources on the imported cities or regions, to promptly identify the viral shedding time and disease severity is required but challenging. Traditional evaluation scoring tools, such as CURB-65, qSOFA, and NEWS, could be adopted to assess disease severity, but not for the early assessment of COVID-19 severity [4]. To date, a certain number of models predict the risk of severe COVID-19 has been developed [5][6][7].
However, there is geographic discrepancy of severity and mortality rate in patients with COVID-19 infection, and most of the previous prediction models were established to predict survival risk or progression to severe or critical state in the south part of China.
Few of them was designed to predict the postponed viral shedding time. Besides, the models established previously were mainly consist of variables that extracted from the 5 / 24 clinical and laboratory parameters, with few of them incorporated epidemic features or therapeutic regimen. Since the first case emerged in Liaoning province in Jan 22th, 2020, especially the rebounding outbreak of imported cases from abroad in the northeast of China (Jilin province) in May, 2020, there is an urgent need to construct a simple, efficient and accurate "early-warning prediction model" for disease progression at early stage once the patients was admitted to the hospital.
This study aimed to fully characterize the demographic, epidemic, clinical features and therapeutic regimens and to detect their association with postponed viral shedding time and disease progression among patients with COVID-19 in Liaoning province, China.
Then, to specifically design and validate two prognostic discriminant models incorporating the associated features. These models can serve as early-warning prediction tools to estimate the postponed viral shedding and to identify the risk of progressing to severe stage in advance among patients with COVID-19 infection.

Study design
This retrospective multi-center cohort study included consecutive patients that were laboratory-confirmed of COVID-19 infection and enrolled from Jan 22th to Mar 22th, 2020 in eight designated hospitals throughout Liaoning province. All patients enrolled were diagnosed with COVID-19 according to the WHO interim guidance [8]. Laboratory confirmation of COVID-19 was achieved by the nucleic acid test using real-time reverse transcriptase-polymerase chain reaction (RT-PCR) assay at Liaoning municipal Center for 6 / 24 Disease Prevention and Control (CDC). Samples were collected using a nose swab and/or throat swab from each suspected patients. This study was approved by the institutional review board of each participating site, and the writing informed consent was waived.

Data collection
The date of disease onset (defined as the day when any symptom was noticed by the patients) and hospital admission date, the first day for nucleic acid detected to be positive or negative were all recorded. Virus detection was repeated twice every 24 hours. All the patients were hospitalized and the clinical outcomes were monitored for at least 8 weeks.
All the clinical data on epidemiology (recent exposure history), symptoms, signs, underlying comorbidities, laboratory results (on admission), imaging findings (on admission) and clinical progression were recorded and retrospectively double-extracted from electronic medical records, with two independent reviewers extracted the data and evaluated the eligibility of the original data. community-acquired pneumonia (CAP) [9]. Severe/critical cases of COVID-19 should meet one major criterion (septic shock with need for vasopressors or respiratory failure requiring mechanical ventilation) or at least three minor criteria: (a) respiratory distress with respiratory frequency ≥30/min; (b) oxygenation index (artery partial pressure of oxygen/inspired oxygen fraction, PaO2/FiO2) ≤ 250 mmHg; (c) multilobar infiltrates, confusion/disorientation; (d) uremia (blood urea nitrogen≥20 mg/dL); (e) leukopenia (white blood cell count <400 cells/μL); (f) thrombocytopenia (platelet count <100,000/μL); (g) hypothermia (core temperature <36˚C); (h) hypotension requiring aggressive fluid resuscitation. Non-severe patients were defined as a confirmed case with fever, respiratory symptoms, with or without radiographic evidence of pneumonia.
Discriminate factors were also quantitatively assigned, some variables such as the imaging characteristics were assigned from 0, and the order was based on their influence on the progression of disease.

Statistical analysis
Categorical variables were summarized as frequencies and percentages. Continuous variables were described using median and interquartile ranges (IQR) values.
Comparisons between groups were tested by F test for continuous variable, or by  2 test for categorical data. Features significantly different (p < 0.05) in both algorithms were selected into the models. Setting disease progression and postponed viral shedding time as the goal for discriminant models, logistic regression, linear discriminant analysis, K- nearest neighbor, support vector machine (SVM) and decision tree were constructed through Python software. After comparing effectiveness among them, the most optimal discriminant models incorporating multiple related factors were established to reflect the probability of disease progressing to severe stage or postponed viral shedding. The models were constructed using the output as an outcome, while the output ≥ 0 indicated disease progressing to severe/critical stage or postponed viral shedding time. The precision of the prediction models were further evaluated and validated using the area under the receiver-operating characteristic curve (AUROC).

Clinical and laboratory features associated with disease progression or postponed viral shedding time
A total of 44 laboratory and clinical records on admission and during hospitalization were obtained and analyzed, including but not limited to the demographics, symptoms, signs, images, blood routine, immunocytochemistry, enzymatic and liver/renal function.
These data was acquired within 24 hours on admission. In order to eliminate the overfitting effect and regularize the models, z-score standardization [x * = (x-mean) /standard deviation] was conducted on all continuous variables in the data set. Thus, each corresponding feature was converted into a normal distribution with mean value of 0 and variance of 1, so as to eliminate the dimensional influence. The pre-conditioned data were then compared between groups divided by disease severity (non-severe vs. severe group) or viral shedding time (with cutoff value 14 days) using F test or  2 test. Features were considered to be significantly associated with disease progression or postponed viral shedding time when p < 0.05 (Table 2).  Notably, all data needs to be equally standardized [ * x x   − = ] before being plugged into the equation, and the corresponding mean and standard deviation were determined and shown in Table 6,7.

Performance of the discriminant models
When applying the discriminant model of postponed viral shedding time onto the validation of the training set (  Figure   1a) and 0.73 in the testing dataset (Figure 1b).
According to the confusion matrix of discriminant model for disease progression (Table   9), during the validation of the training set, sensitivity was 0.774, specificity was 0.884, positive predictive value was 0.75, and negative predictive value was 0.897. In the test set, sensitivity was 0.75, specificity was 0.889, with positive predictive value 0.75 and negative predictive value 0.889. The recall rate was 75%, and accuracy was 0.846. The AUROC was also constructed to evaluate the effectiveness of the discriminant models The AUROC of the combinations of 11 demographic, clinical and imaging/laboratory features was 0.829 in the training dataset ( Figure 2a) and 0.819 in the testing dataset ( Figure 2b).

Discussion
This retrospective study tentatively developed two discriminant models consisting of infection.
In this study, the median viral shedding time was 14 days (IQR, [11][12][13][14][15][16][17][18][19]. For hospitalized patients, one of the isolation release and discharge criteria is a sputum/oral swab testing negative twice with 24h interval [5], however, SARS-CoV-2 in the respiratory tract, especially sputum, has been observed to be associated with a prolonged viral shedding and high viral load, when compared with the stool specimens [11]. In order to optimize therapeutic strategies and to effectively control the transmission resources on the imported cities or regions, it is essential to identify factors that associated with the COVID-19 PCR negative conversion time and to establish a prediction model that could identified as discriminatory factors and were devised to discriminant models for prediction of postponed viral shedding of COVID-19 infection. Both older age and delayed antiviral treatment could give rise to the postponed virus clearance time. Consistently, the association between delayed initiation of antiviral treatment and the prolonged virus shedding for influenza A (H7N9) and SARS-CoV-2 was also observed in previous studies, indicating that timely initiation of antiviral treatment is necessary for viral clearance [12][13][14]. In addition, dual-antiviral therapy of nebulized IFNα with lopinavir/ritonavir (x33) was negatively associated with the viral shedding time, whereas treatment with antibiotics (x37) and methylprednisolone (x40) were related to postponed viral shedding. So far, no antiviral drug targeted the virus or the host cell has been proved to be effective for the treatment of COVID-19 [15], but few existing antiviral drugs have brought hopes when the reduction of viral load is concerned. Evidence has emerged that SARS-CoV-2 is more susceptible to IFNs when compared to SARS-CoV, as the inhalation of Interferon-α (IFN-α) 2b could reduce the infection rate significantly and it can be used for prophylaxis of SARS-CoV-2 infection [16,17]. In addition, Lopinavir/ritonavir (Kaletra) was found to have anti-SARS-CoV efficacy in vitro [18], but presented controversial therapeutic effects as compared to the standard care in vivo [19][20][21]. Lopinavir/ritonavir is recommended by the National Health Commission of China for the treatment of COVID-19 at present [22]. lopinavir/ritonavir. Anyway, statistical analysis can only stress association but cannot explain causality. A recent study also observed that early initiation of dual-antiviral treatment with lopinavir/ritonavir + IFN-α combination therapy could help shorten the duration of SARS-CoV-2 shedding when compared with triple antiviral treatment (opinavir/ritonavir + IFN-α + arbidol) [14]. This conclusion may provide a rationale for clinicians to optimize and early initiate the antiviral treatments. Conversely, treatment with antibiotics (x37) and methylprednisolone (x40) could give rise to the prolonged viral shedding time. This finding is consistent with previous studies, which observed that highdose corticosteroids were associated with increased mortality and longer viral shedding in patients with influenza A (H7N9) viral pneumonia and MERS [23][24][25]. Systemic corticosteroids could increase the risk of opportunistic infections (such as bacterial or fungal) that occur secondary to immunosuppression, and eventually hinders the virus clearance ability [26]. Besides, potential bacterial infections secondary to influenza viral infection that has been commonly seen in this study (32.8%) could also prolong the viral clearance time, as indicated by the evidence that the use of antibiotics was associated with postponed viral shedding time.
Noteworthy, this study for the first time observed that intervals from the first case emerged in Liaoning province (Jan 22st, 2020) to the individualized onset of symptoms (x2) could serve as an important prognostic feature in our early-warning model of disease progression. In this study, nearly half of the confirmed cases at the early stage (in January) of COVID-19 outbreak were severe cases, whereas in the latter period (after February), the percentage of non-severe cases became dominant (76.9%) in Liaoning province.

/ 24
Another explanation is that pathogens tend to reduce their virulence overtime in order to maximize their between-host transmission, which could result in the gradually lowered severity of COVID-19 infection on the imported regions [27,28] [29]. ADE hinders the ability to manage inflammation, and result in disease progression. Another explanation would be attributed to the SARS-CoV-2 strains of L type, which are evolutionarily more aggressive and contagious. This virus strains of L type with altered virulence could probability be the underlying causal pathogen for patients who acquired infections via short exposure to epidemic area [30]. In our discriminant models, both immune features (lymphocytes, neutrophils, N/L ratio, CRP) and enzymatic index (LDH, CK) obtained on admission were observed as the most significant prognostic factors for disease severity, which is consistent with the previous literatures. This illustrate an earlier exhibition of abnormal laboratory features prior to the disease progression [31,32].
Overall, these two discriminant models in the present study was demonstrated to have 18 / 24 satisfied sensitivity (>72.00%) and specificity (>73.00%), and they can be used as early warning tools of disease progression among patients with COVID-19 upon admission. A medical staff can easily predict in advance using these two discriminant models and conduct a timely and optimal medical intervention at an early stage. Anyway, some limitations should be noted in this study. First, the study design was retrospective and the sample size may be insufficient for characterization of an entire population. However, by including all patients from eight designated hospitals throughout Liaoning province, we considered patients recruited in this study are representative of cases diagnosed with COVID-19 in Liaoning, China. Secondly, not enough severe/critical cases were recruited for the present study, possibly because the morbidity of fatality rate of patients infected by COVID-19 in Liaoning province was lower (1.6%) than the whole national average level (3.2%) [33], and not resembling that in previous studies from Wuhan [2,34]. Thirdly, we only included the initial antiviral treatments as factors for prolonged shedding duration, so as to minimize bias as much as possible. This discrepancy may have had an unknown influence on the efficacy of the models.
In conclusion, the discriminant models reported here is the first attempt of its kind to develop an early warning tool for both postponed viral shedding and disease progression in the northeast area of China. We believe that these models can help to judge the disease progression early enough in a great number of patients with COVID-19 infection, and this early judgement can facilitate a timely medical intervention, which will ultimately reduce the prevalence and mortality of COVID-19.