Informing the public health response to COVID-19 (and lessons learnt for future pandemics): a systematic review of risk factors for disease, severity, and mortality.


 BackgroundSevere Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2) has challenged public health agencies globally. In order to effectively target government responses, it is critical to identify the individuals most at risk of coronavirus disease-19 (COVID-19), developing severe clinical signs, and mortality. We undertook a systematic review of the literature, to present the current status of scientific knowledge in these areas and describe the need for unified global approaches, moving forwards, as well as lessons learnt for future pandemics. MethodsMedline, Embase and Global Health were searched to the end of April 2020, as well as the Web of Science. Search terms were specific to the SARS-CoV-2 virus and COVID-19. Comparative studies of risk factors from any setting, population group and in any language were included. Titles, abstracts and full texts were screened by two reviewers and extracted in duplicate into a standardised form. Data were extracted on risk factors for COVID-19 disease, severe disease, or death and were narratively and descriptively synthesised. Results1,238 papers were identified post-deduplication. 33 met our inclusion criteria, of which 26 were from China. Six assessed the risk of contracting the disease, 20 the risk of having severe disease and ten the risk of dying. Age, gender and co-morbidities were commonly assessed as risk factors. The weight of evidence showed increasing age to be associated with severe disease and mortality, and general comorbidities with mortality. Only seven studies presented multivariable analyses and power was generally limited. A wide range of definitions were used for disease severity. ConclusionsThe volume of literature generated in the short time since the appearance of SARS-CoV-2 has been considerable. Many studies have sought to document the risk factors for COVID-19 disease, disease severity and mortality; age was the only risk factor based on robust studies and with a consistent body of evidence. Mechanistic studies are required to understand why age is such an important risk factor. At the start of pandemics, large, standardised, studies that use multivariable analyses are urgently needed so that the populations most at risk can be rapidly protected. This review was registered on PROSPERO as CRD42020177714.


Introduction
The world is currently experiencing a pandemic of coronavirus disease (COVID-19) caused by the Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2).(1) The risk of morbidity and mortality from the virus is strongly stratified, with poor clinical outcomes considered more likely in certain vulnerable groups. For example, studies from different countries have established that older age groups are at increased risk of death. (2,3) The ability to identity the population groups most at risk from the virus has manifold public health purposes. Using such data, stratified vaccination policies for governmental delivery can be designed, similar to those for influenza.(4) Prioritised access to healthcare facilities can be determined, i.e. early identification of the individuals most likely to progress to severe disease and thus in need of intensive care and ventilation. Official advice can be issued to vulnerable groups to let them know that they are more at risk from SARS-CoV-2 virus, to promote behaviour modification.(5, 6) Such population groups can also be the target of more formalised 'segment and shield' approaches, whilst relaxing restrictions for the rest of the population.(7) Potential public health policies along this route have been critiqued, however, on an inclusivity basis, particularly due to the unintended harmful consequences to already marginalised groups.(8) In the UK, vulnerable people have been stratified into two tiers (Table 1); those at risk of severe illness, who were advised to be particularly stringent with social distancing measures, and those within that group at further risk -described as 'shielded' individuals -who were advised to self-isolate and were provided with additional advice.(9-12) The former categorisation was based on the groups targeted for National Health Service programmes on influenza vaccination and the latter on clinical consensus. These strata were deliberately broad, to maximise the number of individuals protected. As the evidence evolves -e.g.
regarding the involvement of the cardiovascular system(13) -there is the opportunity for the Reference lists of included papers and review articles were also searched, as was the grey literature of public health reports for the 26 countries with the highest numbers of reported patients with COVID-19 at the end of April 2020.

Eligibility criteria and study selection
The following inclusion and exclusion criteria were applied to the search results. Two reviewers independently screened all titles, abstracts and full texts for both literature searches. Discrepancies were resolved by consensus. Studies published in languages other 7 than English were screened by at least one additional reviewer, with further quality control by another member of the reviewing team.

Data extraction
Three reviewers independently double-extracted the studies into a pre-designed spreadsheet that collected: Results were compared and discrepancies resolved by discussion. Data from studies published in languages other than English were extracted by two additional reviewers, with further quality control by another member of the reviewing team.

Quality assessment
Two reviewers independently assessed the quality of included studies. Studies published in languages other than English were quality assessed by two additional reviewers, with further quality control by another member of the reviewing team. Assessments were undertaken from the perspective of the objectives of this review, which were not necessarily identical to the objectives of the underlying studies. The quality of included studies was assessed using a checklist adapted from Downs and Black, (15) as per the guidance issued by Deeks et al. (16) When assessing the power of studies, the minimum sample size required to detect a relative increase in risk of 10% from a statistically conservative baseline of 50% among the unexposed was calculated at different powers using the Kelsey method within Epi Info. This 10% value was based on governmental discussions taking place in the UK at the time the review took place. An alpha of 5% was set as the standard. Pragmatically, we assumed only two strata and a ratio of 1:1 between exposure strata. Different thresholds were used for case-control studies and for cohort or cross-sectional studies. These criteria were scored from 0 (<70% power) to 5 (>99% power). We considered results sufficient adjusted for confounding if they adjusted for at least the minimal variable set of age, sex, ethnicity and any measure of comorbidities. For ethnically homogenous populations, the need for adjustment for ethnicity was discounted. If two analyses were presented within a single paper with different quality scores, the most conservative score was retained. Studies were not excluded on the basis of the quality assessment.

Analysis and synthesis
Studies were grouped on the basis of the outcome examined (disease, disease severity, mortality) and then the risk factors examined. Results were classified on the basis of whether they presented evidence as to the exposure under study being a risk factor, taking into account the number of individuals exposed. Where studies focussed on a single risk factor of interest with adjustment for confounding, we extracted all data on potential risks in order to maximise the value of our dataset (whilst accepting that such mutually adjusted estimates for covariates may remain confounded even if that for the primary exposure does not). (17) As there was substantial heterogeneity in study design, reporting, and the risk factors examined, we present a detailed descriptive summary and narrative synthesis of our findings, rather than a meta-analysis.

Registration and reporting
This review was registered on PROSPERO as CRD42020177714 and is reported according to the PRISMA guidelines.

Results
2,868 hits were obtained by the searches across the two dates ( Figure 1). After deduplication across the different databases, this was reduced to 1,238. 30 studies were included at the extraction stage; the main reasons for exclusion were small numbers of participants and studies not having a comparator population. From the grey literature an additional report was included and two studies were identified from reference lists.
Included studies are presented in Table 2. 29 of the 33 studies were conducted in China, with one each from France, Italy, Singapore and a combined study from England, Wales and Northern Ireland. Six were studies with COVID-19 disease as the outcome, 20 of disease severity and ten of mortality. One additional study looked at a combined outcome of disease severity and mortality. (18), (19), (20), (21)

Quality assessment
Included studies were generally too small to detect a 10% increase in risk of disease, disease severity, or mortality (Table 3). One study among the 33 was assessed to have 95% power and two others 99%; all were large, national, investigations. As 26 studies were purely descriptive or presented univariable analysis only, there was no adjustment for confounding.
Remaining studies with a regression component did not adjust for our minimal confounder set. Only nine studies provided estimates of the random variability of effect estimates. The majority of studies ascertained exposure information from clinical records, which would have collected data prospectively and thus with limited recall bias. Blinding of outcome and exposure recording by investigators was not documented. In the case of certain disease severity outcomes, such as admittance to intensive care units (ICU), variability in thresholds for reaching these outcomes is likely to exist between settings and clinicians

Risk factors for disease
Six studies compared the likelihood of having COVID-19 to other infectious conditions (Table   4). Of note, as testing strategies were largely focussed on hospitalised individuals i.e. those displaying noticeable symptoms, studies were of the likelihood of COVID-19 disease, rather than more broadly of SARS-CoV-2 infection (and particularly of severe disease, although patients with mild and symptomatic infection were also reported to be hospitalised in some studies for the purposes of isolation or observation). Age and sex were key foci as potential risk factors, comparing patients with COVID-19 to either: a) SARS-CoV or Middle Eastern Respiratory Syndrome (MERS), or b) other forms of pneumonia. Generally, sex ratios were skewed such that men were over-represented among those with disease. In England, Northern Ireland, and Wales, Asian and Black individuals were found to be at increased risk of COVID-19 in descriptive analyses, with 15.4% and 10.7% of patients falling into these groupings, respectively, versus 5.8% and 2.8% of individuals with other viral pneumonia.(29) Higher body mass index (BMI) was also suggested to be a risk factor with two descriptive analyses, for example in the Intensive Care National Audit and Research Centre (ICNARC) report 31.2% of COVID-19 patients had a BMI of 30-<40, versus 23.5% of people with other viral pneumonia.(29, 36) Given the large, national, scope of the ICNARC dataset, results from it are particularly likely to be reliable.

Risk factors for severe disease
Among the 20 studies of risk factors for severe versus milder disease and one of a mixed outcome (severe disease and death), a wide array of definitions of severity were used, such as ICU admission, the need for mechanical ventilation, and various measures of respiration and oxygenation (Table 2). Many risk factors were examined (Table 5). As well as potential demographic risks (age, sex, ethnicity), behavioural traits (smoking) and broad clinical factors (BMI, infectious diseases) were analysed. Large numbers of papers sought to explore the implications of different comorbidities on the risk of severe COVID-19, particularly respiratory and cardiovascular conditions. The least equivocal evidence was presented for age as a risk factor, including four studies where it was an independent risk in a multivariable regression model. (18, 19,

Risk factors for mortality
Ten studies examined risk factors for mortality, often by nesting case-control studies within prospective or retrospective cohorts (Table 6). Among these studies, many included statistical testing, but none presented an adjusted regression model for the risk factors considered.
Eight studies examined age and all provided evidence for it being a risk factor for mortality, (20,24,26,34,(43)(44)(45)(46) although none adjusted for other factors, such as comorbidities. Age groups from 50 upwards were considered particularly at risk. In the single regression analysis, the hazard rate for death in those 65 years or over was estimated to be six times that of individuals under 65.(43) The evidence was similarly consistent for general comorbidities (albeit all the studies were descriptive); among individuals who died, comorbidities were 1.5 to 2.8 times more common than among those who survived. (20,34,45, 46, 50) Evidence was more equivocal, but still in favour, of hypertension, (3, 20,

Discussion
In this systematic review of risk factors for COVID-19 disease, disease severity and mortality, we document 33 comparative studies examining sociodemographic, behavioural and clinical exposures. Age and sex were very commonly examined; a wide array of comorbidities have also been considered.
Within the synthesised evidence, risk factors for mortality were the clearest, plausibly partly because this outcome is easy to define. Increasing age (different studies presented different thresholds, but being over 50 years of age was common) was an uncontested risk factor.
Five studies also presented evidence for the presence of any comorbidities being a risk factor, (20,34,45,46,50) with none demonstrating evidence against. Given the increasing prevalence of comorbidities with age, the lack of adjustment for confounding in these studies likely over-emphasises the effect size of each risk factor. We note that work subsequent to our literature search documents an independent effect of age on COVID-19 mortality from overall comorbidities, as measured by the Charlson Comorbidity Index Score, but not viceversa.(51) Another study published outside of the time range of our search found both age and an array of comorbidities, each analysed separately (chronic cardiac disease, chronic pulmonary disease, chronic kidney disease, chronic neurological disease, dementia, malignancy, moderate/severe liver disease; and obesity), to be independent risk factors (as well as sex).(52) Risk factors for severe disease were more complex to synthesise, likely due to the mixed array of outcome measures that can also be prone to observer bias. The impact of age was very commonly assessed, generally showing evidence in favour of this being a risk factor (with a similar age spectrum to the mortality data). Ethnicity was studied in two publications,(25, 29) with mixed results. We note that such findings are likely to be highly context-specific, given that ethnicity acts as a proxy for a series of sociodemographic factors that are highly relevant to the spread of an infectious condition (as well as, perhaps, some biological traits).
Studies of risk factors for COVID-19 disease have been complicated by testing strategies globally, which have largely been concentrated on severe disease. As our knowledge of the full symptom spectrum of the disease moves forward, it will be possible to have a broader case definition that does not solely focus on viral testing, and thus the ability for more generalised complementary studies. Additionally, serological surveys assessing the history of infection with SARS-CoV-2 in different population groups will allow the identification of risk factors for infection, whether symptomatic or not. Both ethnicity (Black and Asian individuals at higher risk; from a single study in England, Northern Ireland and Wales)(29) and higher BMI were found to be associated with disease severity within the included literature, (29,36) again from descriptive studies only. While these studies were not eligible for our review, we note a series of reports from non-comparative studies documenting the potential influence of ethnicity on the likelihood of getting COVID-19 e.g. the work of Price-Haywood from the US.(51) Male sex was reasonably consistently shown to be a risk factor for presence of COVID-19 but not with severity of disease or mortality. (23,29,39) As with ethnicity, socioeconomic and behavioural factors make this association likely to vary between settings. This pathogenesis therefore offers several points where co-morbidities may exacerbate the process. The target receptor TMPRSS2 is modulated in response to air pollution and in autoimmune conditions such as asthma,(54) which may affect the number of receptors available for SARS-CoV-2 to target, and ACE2 is involved in the renin-angiotensin system (RAS) which controls blood pressure. Viral interference causes dysfunction, which leads to a pro-inflammatory state and increased vascular permeability in response to changes in vascular contraction and sodium homeostasis -exacerbating the effect from the physical damage to the affected cells. Conditions causing hypertension -both primary and secondary to renal disease, endocrine dysfunctions such as hypothyroidism, cardiovascular dysfunction such as arteriosclerosis, or neurological dysfunctions such as acute stress -also affect the RAS,(57) meaning that these conditions might be expected to exacerbate pathology caused by SARS-CoV-2. Any condition creating a pro-inflammatory state, such as type II diabetes or pre-existing infection, or involving autoimmunity, such as type I diabetes, might also be expected to contribute to increased pathology. There is also the direct effect of cell damage -if the target tissues are already damaged this reduces 'spare' capacity and therefore the leeway for adaptation to allow the host to continue to maintain homeostasis whilst still being able to eliminate the pathogen and repair the damage. The need for inflammatory cells to clear the infection is also a potential area of interface with comorbidities e.g. conditions such as unsuppressed HIV infection or congenital deficiencies, or the administration of immunosuppressant drugs.
The effect of ageing was particularly strong within our review, both in terms of the magnitude of effect estimates and the number of studies presenting evidence. As well as the above impact of comorbidities, we note that the host's age may influence pathogenesis, both in terms of the likelihood of having various comorbidities, and also due to its effect on the immune system. Indeed, the immune system becomes less effective over time (immunosenescence), which affects the quality and number of immune system cells generated.(58) Given the scale of the impact of age documented within this review, it seems unlikely that its effect can be explained by a single or a small number of comorbidities which are yet to be detected. This opens up the need to explore biological markers, for example ACE2,(59) and markers of immunosenescence.
The strengths of our review include its systematic approach and broad use of search terms to avoid missing studies. We additionally present a quality assessment to aid the interpretation of the strength of the evidence. In some instances, included publications may have focussed on one specific outcome, whereas our quality assessment took the perspective of the outcomes extracted for this review. We were unable to detect instances where two publications used the same patient populations for their analyses, potentially over-emphasising certain findings. Given the global nature of the pandemic, our review includes studies from around the world, albeit with a large preponderance from China, including studies conducted early after the emergence of SARS-CoV-2 when the at-risk population was predominantly those who had contact with Huanan seafood market and their contacts, and not necessarily representative of the general population. We note a particular lack of studies from the African continent and the Americas, which may have implications for generalisability. Given the rapidly evolving literature on COVID-19, we also note our exclusion of studies published after April 2020, and our exclusion of preprints (which was undertaken to ensure that all included studies had undergone an external quality assessment prior to inclusion).
Across the included publications, variability in study design, exposure and outcome measurement, and analyses made exact syntheses of effect sizes across different risk factors very difficult. Measures of disease severity varied, e.g. admission to ICUs or clinical parameters such as percentage oxygen saturation of the blood. Even measures such as admission to ICU can be subjective and may be time-, clinician-, and health systemsdependent. If severity is recorded at admission, risk factors may reflect issues associated with delayed access to healthcare, which may differ between settings and healthcare systems. It is also important to note that, in some studies of disease severity, mild disease included both people who were hospitalised with symptoms and asymptomatic individuals identified through contact tracing. Generally, analyses were descriptive or univariable and thus did not control for confounding. As documented above, this may be particularly problematic when it comes to separating the impact of age and the presence of comorbidities, as well as for identifying which comorbidities truly increase risk, given that many patients may have multi-morbidity.
The implications of our findings are two-fold for COVID-19, firstly for current public health practice and secondly for the design of future studies. We flag a number of factors of interest that should be considered by governments and public health agencies when designing shielding strategies and the targeting of future vaccines, as well as in mathematical modelling projecting the likely impact of the pandemic over time. We note, however, the need for sensitive handling of population groups deemed to be at higher risk, and how such labelling does not devolve responsibility from public bodies to these individuals for their own welfare.(8) Some public health agencies are now including reporting of potential risk factors in their routine outputs, including ICNARC (included in this review)(29) and the newer European Centre for Disease Prevention and Control reports, which were released after this review was conducted.(60) Our review demonstrates both the volume of literature that can be published within only a few months since the appearance of an emerging infectious disease, and the need for coordinated approaches to such pathogens. Global efforts using national datasets are hugely valuable in systematically determining the aetiology of a disease, particularly to detect smaller effect sizes. Determination of the exact threshold of important risk depends on public perceptions of the disease,(61) as well as policy needs. Data collection should be standardised where possible, e.g. by using consistent definitions of outcomes and the treatment of exposures (for example for hypertension, given that blood pressure is continuous). (For COVID-19 we note both the valuable World Health Organization interim guidelines on its management in providing consistent approaches for testing and the definition of ARDS, (14) and that platforms such as the International Severe Acute Finally, appropriately adjusted multivariable analyses should be prioritised, in order to separate the implications of different risk factors and to infer true causal relationships, for example exploring specific markers of comorbidity severity and control, such as the use of specific medications. Early clinical studies during pandemics are critically important and published rapidly under extremely difficult circumstances, but we would argue that highquality epidemiological studies should also be seen as a priority, and that emergency response plans should include provision of appropriate epidemiological and statistical expertise.

Conclusions
The volume of literature generated in the short time since the appearance of SARS-CoV-2 has been considerable. Many studies have sought to document the risk factors for COVID-19 disease, disease severity and mortality. Age was the only risk factor based on robust studies and with a consistent body of evidence. Mechanistic studies are required to understand why age is such an important risk factor. At the start of pandemics, large, standardised, studies using multivariable analyses -e.g. using national surveillance dataare urgently needed in order to inform stratified approaches to rapidly protecting the population groups most at risk.

Declarations Ethics approval and consent to participate
This is a systematic review and therefore presents secondary data. No ethical approvals were required.

Consent for publication
Not applicable.

Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.     People with cancers of the blood or bone marrow such as leukaemia who are at any stage of treatment Being seriously overweight (a BMI of 40 or above) Those who are pregnant Data taken from sources (9-11). *These groupings represent individuals advised to get a yearly influenza vaccine as an adult for medical reasons. BMI-body mass index, COPD-chronic obstructive pulmonary disease, MS-multiple sclerosis  Is the hypothesis/aim/objective of the study clearly described?
Are the main outcomes to be measured clearly described in the Introduction or Methods section? Are the characteristics of the patients included in the study clearly described? Are the distributions of principal confounders in each group of subjects to be compared clearly described? Are the main findings of the study clearly described? Does the study provide estimates of the random variability in the data for the main outcomes?
Have actual probability values been reported for the main outcomes except where the probability value is <0.001? Was there potential for recall bias in the ascertainment of the exposure? Was there potential for differential or non-differential misclassification of the exposure? Was there potential for observer bias in ascertainment of the outcome? Was there potential for differential or non-differential misclassification of the outcome? If any of the results of the study were based on 'data dredging' was this made clear?
Do the analyses adjust for different lengths of follow-up of patients, or in case-control studies, is the time period between the intervention and outcome the same for cases and controls?
Were the statistical tests used to assess the main outcomes appropriate?
Were the main outcome measures used accurate (valid and reliable)?
Were the patients in different intervention groups (trials and cohort studies) or were the cases and controls (case-control studies) recruited from the same population?
Were study subjects in different intervention groups (trials and cohort studies) or were the cases and controls (casecontrol studies) recruited over the same period of time? Was there adequate adjustment for confounding in the analyses of interest? Were losses of patients to follow-up taken into account?
Are the study results appropriately interpreted e.g. in terms of the strength of the evidence, its application/implications, causality? Did the study have sufficient power to detect a clinically important effect where the probability value for a difference being due to chance is less than 5%?* Author Is the study design clearly reported?
Is the hypothesis/aim/objective of the study clearly described?
Are the main outcomes to be measured clearly described in the Introduction or Methods section? Are the characteristics of the patients included in the study clearly described? Are the distributions of principal confounders in each group of subjects to be compared clearly described? Are the main findings of the study clearly described? Does the study provide estimates of the random variability in the data for the main outcomes? Have actual probability values been reported for the main outcomes except where the probability value is <0.001? Was there potential for recall bias in the ascertainment of the exposure? Was there potential for differential or non-differential misclassification of the exposure? Was there potential for observer bias in ascertainment of the outcome? Was there potential for differential or non-differential misclassification of the outcome? If any of the results of the study were based on 'data dredging' was this made clear?
Do the analyses adjust for different lengths of follow-up of patients, or in case-control studies, is the time period between the intervention and outcome the same for cases and controls?
Were the statistical tests used to assess the main outcomes appropriate?
Were the main outcome measures used accurate (valid and reliable)?
Were the patients in different intervention groups (trials and cohort studies) or were the cases and controls (case-control studies) recruited from the same population?
Were study subjects in different intervention groups (trials and cohort studies) or were the cases and controls (casecontrol studies) recruited over the same period of time? Was there adequate adjustment for confounding in the analyses of interest? Were losses of patients to follow-up taken into account?
Are the study results appropriately interpreted e.g. in terms of the strength of the evidence, its application/implications, causality? Did the study have sufficient power to detect a clinically important effect where the probability value for a difference being due to chance is less than 5%?* Author Is the study design clearly reported?
Is the hypothesis/aim/objective of the study clearly described?
Are the main outcomes to be measured clearly described in the Introduction or Methods section? Are the characteristics of the patients included in the study clearly described? Are the distributions of principal confounders in each group of subjects to be compared clearly described? Are the main findings of the study clearly described? Does the study provide estimates of the random variability in the data for the main outcomes?
Have actual probability values been reported for the main outcomes except where the probability value is <0.001? Was there potential for recall bias in the ascertainment of the exposure? Was there potential for differential or non-differential misclassification of the exposure? Was there potential for observer bias in ascertainment of the outcome? Was there potential for differential or non-differential misclassification of the outcome? If any of the results of the study were based on 'data dredging' was this made clear?
Do the analyses adjust for different lengths of follow-up of patients, or in case-control studies, is the time period between the intervention and outcome the same for cases and controls?
Were the statistical tests used to assess the main outcomes appropriate?
Were the main outcome measures used accurate (valid and reliable)?
Were the patients in different intervention groups (trials and cohort studies) or were the cases and controls (case-control studies) recruited from the same population?
Were study subjects in different intervention groups (trials and cohort studies) or were the cases and controls (casecontrol studies) recruited over the same period of time? Was there adequate adjustment for confounding in the analyses of interest? Were losses of patients to follow-up taken into account?
Are the study results appropriately interpreted e.g. in terms of the strength of the evidence, its application/implications, causality? Did the study have sufficient power to detect a clinically important effect where the probability value for a difference being due to chance is less than 5%?*