Study Population
This retrospective cohort study included two cohorts of consecutive inpatients with COVID-19 from Henan Provincial People's Hospital (Zhengzhou, Henan Province, China) and Anyang Infectious Disease Hospital (Anyang, Henan Province, China). The two hospitals were designated hospitals for the treatment of COVID-19 in Henan Province. A total of 123 patients with confirmed COVID-19 including 58 cases from Henan Provincial People's Hospital and 65 cases from Anyang Infectious Disease Hospital who were transferred to the above two hospitals from February 3, 2020 to March 31, 2020 were included.
Diagnostic Criteria
Patients with COVID-19 can be confirmed when throat swab specimens are tested positive for SARS-CoV-2 RNA by real-time reverse transcription polymerase chain reaction (RT-PCR), which is the gold standard for testing for SARS-CoV-2 infection as recommended by WHO interim guidelines [8]. Cessation of SARS-CoV-2 RNA shedding was defined as negative RT-PCR on two consecutive throat swab specimens at least 24 hours apart. The duration of SARS-CoV-2 RNA shedding was defined as the time from illness onset to cessation of SARS-CoV-2 RNA shedding. At present, there is no unified time standard for the cut-off value of prolonged SARS-CoV-2 RNA shedding in academia. Based on previous research results, the median duration of virus shedding was 12-21 days [4, 9]. Therefore, the long-term SARS-CoV-2 RNA shedding was defined as SARS-CoV-2 RNA shedding time greater than 21 days in this study.
Inclusion criteria: (1) patients with confirmed COVID-19 diagnosis; (2) patients with definite outcome of viral shedding duration; (3) patients with complete clinical data.
Exclusion criteria: (1) patients presenting with death while in hospital; (2) patients with incomplete clinical data.
Of the 123 confirmed COVID-19 patients, 19 patients with incomplete clinical data and 7 dead cases were excluded. Finally, 97 patients were enrolled as study subjects. The flow chart of the study population is shown in Fig. 1.
Data collection
We used standardized data collection form to extract demographic, epidemiological, clinical, laboratory, radiological data and SARS-CoV-2 RNA shedding duration from electronic medical records.
Laboratory procedures
Methods for laboratory confirmation of SARS-CoV-2 infection have been described elsewhere [8]. After the clinical symptoms were relieved, throat swab samples were taken every other day for SARS-CoV-2 RT-PCR review, but only qualitative information was available. Routine blood tests included complete blood count, coagulation spectrum, serum biochemical tests [including liver function, kidney function, glucose, creatine kinase (CK), lactate dehydrogenase (LDH)] and inflammatory markers such as C-reactive protein (CRP), calcitonin (PCT) were performed. All hospitalized patients underwent CT scans. The frequency of examinations was determined by the doctor treating the patient.
We referred to the Chinese management guideline for COVID-19 (version 8.0) to define the severity of COVID-19 [10]. In this study, severe and critical cases were defined as severe cases. Mild and normal cases were defined as non-severe cases.
Statistical analysis
Normal distribution variables were presented using mean and standard deviations, non-normal distribution continuous variables were presented using median and interquartile ranges (IQRs), and qualitative data were presented using frequency distribution n (%). The Mann-Whitney U test, χ² test, or Fisher’s exact test were used to compare differences between long-term viral shedding group and short-term viral shedding group where appropriate. The significance of each variable was assessed by univariate and multivariate logistic analyses to identify the independent risk factors associated with long-term SARS-CoV-2 RNA shedding.
SPSS software v.15.0 (IBM Inc., Chicago, IL, USA) and R software v.3.6.1 (Foundation for Statistical Computing, Vienna, Austria) were used for statistical analysis. All significance tests were two-tailed, and P < 0.05 was considered as statistically significant.
Feature selection and nomogram establishment
The statistically significant variables from the multivariable logistic regression analysis were selected to be the most useful predictive features from our cohort data set. To provide the clinician with a quantitative tool to predict individual probability of long-term viral shedding, we built a nomogram, which was established with the R rms package, on the basis of multivariable logistic analysis.
Predicting Performance of the Nomogram
We used logistic regression to analyze the predicted long-term virus shedding probability, and then constructed a receiver operating characteristic curve (ROC). In order to quantify the discrimination performance of the nomogram, the area under the ROC (AUROC) curve is calculated. Calibration curves were plotted to evaluate the calibration of the nomogram, accompanied with the Hosmer-Lemeshow test.