Trial design
The CLASSIC trial was a European investigator-initiated, stratified, parallel-group, open-labelled randomised trial. The trial protocol was approved by the relevant medicine agencies and ethics committees [6]. The trial protocol, statistical analysis plan and primary results have been published elsewhere [6, 12]; so has the statistical analysis plan for the 1-year outcomes [11]. Some deviations from the protocol and analysis plan were necessary; these are outlined with rationales in the Electronic Supplementary Material (ESM1). We report this manuscript according to the CONSORT 2010 Statement (checklist in ESM2).
Trial sites and patients
Patients were enrolled from November 2018 to November 2021, in 31 ICUs in Denmark, Sweden, Norway, Switzerland, Italy, the Czech Republic, the United Kingdom, and Belgium after written informed consent from patients or their legal surrogates according to national regulations [12].
We enrolled adult ICU patients with septic shock according to the SEPSIS-3 criteria [13], who had received at least 1L of IV fluid in the last 24 hours, and onset of shock no longer than 12 hours before screening. Further details regarding the in- and exclusion criteria are presented in the ESM1 and elsewhere [6, 12].
Outcomes
The pre-specified secondary outcomes assessed 1 year after randomisation were all-cause mortality, HRQoL, and cognitive function [11]. To increase follow-up rate and uniform data collection, we made a standard operating procedure (in the ESM1) for all patients [14]. Trial staff made several attempts to obtain follow-up data for at least 4 weeks after the 1-year date. The process was centrally monitored by the coordinating centre in Denmark to support sites in obtaining responses. Data were obtained from medical records (i.e., survival status) and by phone interviews with survivors in their native language. Survivors were interviewed over the telephone by certified trial staff (ESM1) who were masked for the intervention using EuroQol 5 dimension 5 levels (EQ-5D-5L) questionnaire and EQ visual analogue scale (EQ VAS) [15, 16] and Mini Montreal Cognitive Assessment (MoCA) test [17]. In some cases, relatives provided data on survival status or, if necessary, performed the HRQoL on behalf of the patient (using the proxy version of the tool). Relatives could not perform the cognitive test.
The EQ-5D-5L is a generic instrument to describe and value health and has 5 dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension has 5 response levels: no, slight, moderate, severe or extreme problems [15, 16]. It also includes EQ VAS, for which respondents are asked to mark how good or bad their health is on the day of the questionnaire on a scale from 100 (‘the best health you can imagine’) to 0 (‘the worst health you can imagine’).
The HRQoL outcome measures were EQ-5D-5L index values, a summary score based on the 5 domains reflecting health states according to the preference of a general population ranging from 1.0 (perfect health) to values below 0 (health states valued worse than death, with 0 defined as a state equal to death) and EQ VAS [16]. We used country-specific value sets to calculate the index values for Denmark [18], Sweden [19], England [20], and Italy [21]. For countries with no specific value set, we contacted the national investigator and agreed on a value set close to that country as for culture and healthcare system. For Switzerland, we used the German value set [22]; for Norway, the Danish value set [18] and for Czech Republic, the Polish value set [23]. As recommended, we conducted an additional analysis with index values calculated using the Danish value set [18] for all patients (most patients were enrolled in Denmark) [24].
The Mini MoCA is a short version of the MoCA test [17] validated for telephone use [25]. The Mini MoCA consists of 4 cognitive dimensions: attention (immediate recall of 5 words), executive functions and language (1-minute verbal fluency), orientation (6 items on date and geographic orientation), and memory (delayed recall and recognition of 5 previously learned words). The total score ranges from 0 to 30, with lower values indicating worse cognitive function. To correct for any educational effect on the cognitive test, 1 point is added for participants with 12 years of education or less (scores were truncated at the maximum upper value of 30 points) [26]. Further details on the Mini MoCA are presented in the ESM1.
Statistical analyses
We deviated from the predefined analysis plan in the following ways [11]: 1) HRQoL and Mini MoCA were non-normally distributed, hence why we used Kryger Jensen and Lange test only [27], 2) statistical handling of mortality was not clearly specified; we primarily used adjusted logistic regression models with G-computation and non-parametric bootstrapping, 3) we added secondary analyses in survivors only, 4) we added best-worst and worst-best case scenario sensitivity analyses for missing data despite Little’s test rejected data being missing completely at random (described in detail, with reasoning, in ESM1).
The analysis population consisted of all randomised patients (n = 1554) except 5, who withdrew consent for the use of all data. We present descriptive baseline data stratified by treatment allocation and survival/respondence status for HRQoL and cognitive outcomes. Numerical data were summarised using medians with interquartile ranges (IQRs) and categorical data were summarised using numbers with percentages.
As more than 5% of the patients had missing outcome data (8.8% for EQ-5D-5L index values, 9.2% for EQ VAS, and 13.8% for Mini MoCA), we conducted Little’s test, which indicated that data were not missing completely at random (P < 0.001). Consequently, we conducted the primary analyses of these outcomes after multiple imputation of missing data [28]. We used the predictive mean matching method with 50 datasets imputed separately in each treatment group, with the imputation model including the stratification variables (trial site and metastatic or hematologic cancer), baseline values, and all outcomes (ESM1). Additionally, we conducted best-worst and worst-best case imputations of missing data using the mean +/- 1 standard deviation (SD) of EQ-5D-5L index, EQ VAS and Mini MoCA from survivors with complete responses for survivors with missing data and from all patients with available data in patients where survival status was missing, and complete case analyses, which we also used for the mortality outcome because of limited missing data (2.1%).
The primary analyses of all outcomes were adjusted for stratification variables, whereas secondary analyses were unadjusted. We analysed mortality at 1 year using a G-computation procedure based on an adjusted logistic regression model, and 50,000 bootstrap resamples (for the primary analysis), and generalised linear models with binomial error distributions and log/identity links for the unadjusted, secondary analysis. Results are presented as average (unconditional) risk differences (RDs) and relative risks (RRs) with 99% confidence intervals (CIs), supplemented with a Kaplan-Meier survival curve. For the continuous outcomes, we used the Kryger Jensen and Lange test [27] to calculate P-values and linear regression models with a similar procedure as for the primary analysis of mortality and presented average (unconditional) mean differences (MDs) and ratios of means (RoMs) with 99% CIs. For the primary analyses of the numerical outcomes, patients who had died at 1 year were included in the analyses with scores of zero. This corresponds to a health state equivalent to death for EQ-5D-5L index values or the worst possible perceived health state value for EQ VAS or the worst cognitive function score [16]. We also analysed EQ-5D-5L index values, EQ VAS, and Mini MoCA in survivors only. Finally, we analysed EQ-5D-5L index values for all patients using the Danish value set in secondary analyses of all patients and survivors only, respectively.
Analyses were performed using R (R Core Team, Foundation for Statical Computing, Vienna, Austria), versions 4.2.0 and 4.2.1. P-values below 0.01 were considered statistically significant due to multiple testing [11].