We retrospectively retrieved data from administrative databases in Catalonia, a North-East region in Spain with a population of 7.5 million people. Data on potential predictors were retrieved from the Catalan Health Surveillance System (CHSS), which systematically collects data regarding diagnoses, individual income, and resource utilization from both hospital and primary care settings . Data on outcomes associated with SARS-CoV-2 infection were retrieved from the epidemiological surveillance system in the SARS-CoV-2 registry (RSACovid-19) [12,13]. The stratification model was built using data collected between March 1 and September 15, 2020 (development period), which encompassed the first wave of the COVID-19 outbreak in our area and a period between waves. Data for model validation had been collected between September 16 and December 27, 2020 (i.e., the date the first vaccine was administered in Catalonia) (validation period).
All data were handled according to the General Data Protection Regulation 2016/679 on data protection and privacy for all individuals within the European Union and the local regulatory framework regarding data protection. Data from different health administrative databases were linked and de-identified by a team not involved in the study analysis; study investigators only had access to a fully anonymized database. The retrospective use of healthcare data was approved by the Independent Ethics Committee of the IDIAP Jordi Gol (Spain), which waived the need for obtaining informed consent for data utilization.
We considered all variables stored in the CHSS database, including demographic data (i.e., age and sex), resource utilization (e.g., admission to nursing homes), lifestyle information (e.g., smoking, and alcohol abuse), current and past diagnoses (including psychiatric disorders), and socioeconomic status. The global comorbidity burden (or patient complexity) was stratified using the adjusted morbidity groups (GMA, Grups de Morbiditat Ajustada), a population-based tool for health-risk assessment [14,15]. The GMA tool considers the weighted sum of all chronic conditions, the number of systems affected, and acute diagnoses present at the time that may increase patient complexity. Individuals are grouped into four health-risk categories defined using the risk distribution of the entire population: (1) baseline risk (healthy stage, including GMA scores up to the 50th percentile of the total population), (2) low risk, 50th to 80th percentiles, (3) moderate risk, 80th to 95th percentiles, and (4) high risk, above the 95th percentile. Socioeconomic status was stratified according to pharmaceutical co-payment groups, which are based on annual income, as follows: very low (i.e., recipient of rescue aid measures), low (i.e., less than € 18,000), middle (i.e., € 18,000 to € 100,000), and high (i.e., >€ 100,000).
We analysed three outcomes associated with severe COVID-19: hospital admission, transfer to intensive care unit (ICU), and death. The scarcity of PCR tests during the pandemic precluded the testing of all suspected cases of COVID-19. For that reason, we considered the COVID-19 diagnosis according to either molecular criteria (positive result with a PCR or serological test) or clinical/epidemiological criteria, as officially established by the RSACovid-19. Owing to the shortage of ICU beds during the first wave (March 03 to July 15, 2020), the start of invasive mechanical ventilation was considered an ICU transfer, irrespective of an ICU admission registry. All deaths related to COVID-19 were included, whether they had been hospitalized or not.
The dataset for developing the stratification model included all individuals with any of the investigated outcomes within the development period, irrespective of the time of COVID-19 diagnosis. We used generalized linear models (Poisson regression) to build multivariate models for hospitalizations, ICU transfers, and deaths due to COVID-19. The models were created using a "stepwise-forward" approach based on the Akaike Information Criterion (AIC), in which a naïve model is sequentially complemented with the most relevant variables, eventually leading to the main effects model. The models also included all significant first-order interactions between selected variables and sex. Owing to its non-linear behaviour, age was introduced into the model as a continuous variable plus an additional quadratic term. The models provided individual-level estimates of the probability for each outcome (i.e., hospitalization, transfer to ICU, and death) for the entire population of Catalonia. The accuracy of the three models was assessed using the area under the curve of the receiving operating characteristics (AUC ROC). The four risk strata were defined by crosslinking the three categorized probabilities.
The stratification model was validated using an independent dataset of all individuals with a positive PCR result for SARS-CoV-2 infection in a respiratory specimen within the validation period. The goodness of fit of the model was assessed using the AUC ROC and the corresponding 95% confidence interval for each outcome. All analyses were performed using R statistical software, version R-4.0.0.