Critically ill COVID-19 status associated trait genetics reveals CDK6 inhibitors as potential treatment

Despite the recent development of vaccines and monoclonal antibodies preventing SARS-CoV-2 infection, treating critically ill COVID-19 patients still remains a top goal. In principle, drug repurposing – the use of an already existing drug for a new indication – could provide a shortcut to a treatment. However, drug repurposing is often very speculative due to lack of clinical evidence. We report here on a methodology to find and test drug target candidates for drug repurposing. Using UK Biobank data, we matched critically ill COVID-19 cases with healthy controls and screened for significant differences in 33 blood cell types, 30 blood biochemistries, and body mass index. Significant differences in traits that have been associated with critically ill COVID-19 status in prior literature, such as alanine aminotransferase, body mass index, C-reactive protein, and neutrophil cell count, were further investigated. In-depth statistical analysis of COVID-19 associated traits and their genetics using regression modeling and propensity score stratification identified cyclin-dependent kinase 6 (CDK6) as a more promising drug target for the selective treatment of critically ill COVID-19 patients than the previously reported interleukin 6. Four existing CDK6 inhibitors -- abemaciclib, ribociclib, trilaciclib, and palbociclib --have been approved for the treatment of breast cancer. Clinical evidence for CDK6 inhibitors in treating critically ill COVID-19 patients has been reported. Further clinical investigations are ongoing.


Introduction
The phenotype of critically ill coronavirus disease 2019 (COVID-19) status completely differs from mild or moderate disease, even among hospitalized cases, by an uncontrolled overreaction of the host's immune system 1-3 -a so-called virus-induced immunopathology 4 -resulting in acute respiratory distress syndrome (ARDS). Although the molecular mechanism leading to critical illness due to COVID-19 is still unclear, there is evidence that susceptibility and overreaction of the immune system to respiratory infections are both strongly heritable. 5,6 A series of genome-wide association (GWA) studies have been conducted to investigate disease pathogenesis in order to find mechanistic targets for therapeutic development or drug repurposing, as treating the disease remains a top goal despite the recent development of vaccines. [7][8][9][10] The results of 46 GWA studies comprising 46,562 COVID-19 patients from 19 countries have been combined in three meta-analyses by the COVID-19 Host Genetics Initiative. 10 Overall, 15 independent genome-wide significant loci associations were reported for COVID-19 infection in general, of which six were found to be associated with critical illness due to COVID-19: 3p21.31 close to CXCR6, which plays a role in chemokine signaling, and LZTFL1, which has been implicated in lung cancer; 12q24.13 in a gene cluster that encodes antiviral restriction enzyme activators; 17q21.31, containing the KANSL1 gene, which has been previously reported for reduced lung function; 19p13.3 within the gene that encodes dipeptidyl peptidase 9 (DPP9); 19p13.2 encoding tyrosine kinase 2 (TYK2); and 21q22.11 encoding the interferon receptor gene IFNAR2. The functions of the genes associated with these six loci are either related to host antiviral defense mechanisms or are mediators of inflammatory organ damage. These results are a 3 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted  https://doi.org/10.1101/2021.05.18.21256584 doi: medRxiv preprint good starting point for a better understanding of host genetics in viral infections. Nevertheless, none of these genes encodes for an established drug target.
Consequently, these studies provide no evidence that supports drug repurposing.
We present here an approach for drug repurposing based not on disease genetics but on the genetics of disease associated traits. First, critically ill COVID-19 cases are matched with healthy controls, and the two cohorts are investigated for significant differences in previously reported traits. Traits that differ in cases and controls and that have been associated with critically ill COVID-19 status are further investigated to find and test established drug target genes for drug repurposing.

Screening for critically ill COVID-19 status associated traits
We adopt from prior literature a definition of the phenotype of critically ill COVID-19 cases as patients who were hospitalized due to confirmed SARS-CoV-2 infection and who required respiratory support and/or died due to infection. 11 Using UK Biobank data 12 , we identified 8,153 cases and selected age-and sex-matched healthy controls. In order to explore how the critically ill cohort differed in general from healthy controls, we screened 64 candidate predictive traits (33 blood cell types, 30 blood biochemistries, and body mass index) that had been measured years before the individuals were affected by COVID-19 ( Fig. 1). We observed Bonferroni-corrected statistically significant differences (p < = 0.05/64) 13 in 36 α/ traits confirmed by independent two-sample t-test and Mann-Whitney U-test 14 . In these measures, cases showed significant differences in various traits that have been described as phenotypes for critically ill COVID-19 status. For instance, relative to healthy controls, cases had higher body mass index (BMI), higher reticulocyte cell 4 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
As an additional comparison group, we selected infectious disease cases from the UK Biobank in order to assess the extent to which these traits are general indicators of predisposition to severe infections. In total 23,348 participants were selected who had been diagnosed with a respiratory infection, acute respiratory distress syndrome, influenza or pneumonia and were hospitalized or had died as a result.
Again, cases were matched with healthy controls in order to screen for differences in measures for 33 blood cell types, 30 blood biochemistries, and body mass index. As in the COVID-19 investigation, we here found that cases had higher body mass index (BMI), higher reticulocyte cell count, higher inflammatory markers such as alanine aminotransferase, C-reactive protein, cystatin C, neutrophil cell count, and higher glycated hemoglobin (HbA1c), but lower HDL and LDL cholesterol as well as lower vitamin D levels than healthy controls (Supplementary Information Fig. 1 and Traits that have been previously associated with critically ill COVID-19 status and infectious diseases such as alanine aminotransferase, BMI, C-reactive protein, and neutrophil cell count were further investigated. 15,16

GWAS results
We next focused on trait genetics of reported critically ill COVID-19 status associated phenotypes such as BMI, neutrophil cell count, C-reactive protein, and alanine aminotransferase. We ran GWA analyses for these four traits and compared our results with previously reported statistics available from the NHGRI-EBI GWAS 5 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 28, 2021. ; https://doi.org/10.1101/2021.05.18.21256584 doi: medRxiv preprint Catalog 17 . The identified genes were further investigated for already approved drug molecules. We found IL-6, encoding for interleukin 6, reported for C-reactive protein 18 and CDK6, encoding for cyclin-dependent kinase 6 (CDK6), reported for BMI 19 and neutrophil cell count 20 . Unfortunately, we could not confirm the IL-6 signal (rs2097677) in our GWA analysis for C-reactive protein ( Supplementary Information   Fig. 3). We continued our research with the reported drug target for interleukin 6 and interleukin 6 receptor (IL-6R). We furthermore confirmed single nucleotide polymorphisms (SNPs) rs42044 and rs445 in the CDK6 gene as significant in our GWA analyses for BMI and neutrophil cell count ( Supplementary Information Fig. 4).
The allele distributions of rs2097677, rs42044, and rs445 in cases and controls can be found in the Supplementary Information Tab. 1 and 2.

Regression modeling
Logistic regression models were built to examine the relationship between a series of candidate predictive traits (age, alanine aminotransferase, BMI, C-reactive protein, and neutrophil cell count) and critically ill COVID-19 status. All of these traits apart from age were significant predictors of critically ill COVID-19 status (as illustrated for neutrophil cell count in Fig. 2). As expected, no relationship between age and disease status could be found, as the members of the control group were matched with the reported cases by age and sex. A drop one analysis revealed that all traits explain unique variance in critically ill COVID-19 status.

Propensity score analysis
Propensity score analysis is a technique for estimating the effect of a treatment on an outcome independent of any observed factors that covary with that treatment and 6 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Mendelian randomization
Mendelian randomization (MR) is a robust and accessible tool to examine the causal relationship between an exposure variable and an outcome from GWAS summary statistics. 22 We employed two-sample summary data Mendelian randomization to further validate causal effects of neutrophil cell count genes on the outcome of critically ill COVID-19 status. We used independent GWAS summary data for neutrophil cell count (exposure) published by Vuckovic et al. 23  7 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
We have described a method for identifying drug targets for the treatment of disease based not directly on the genetics of the disease itself, but rather on the genetics of disease-associated traits. Using data from the UK Biobank, we found evidence of a causal relationship between a series of traits and critical illness due to COVID-19.
Using genome-wide associations, we identified genetic markers associated with these traits. Based on these two steps, we were able to identify CDK6 as a potential drug target for critically ill COVID-19 status. The four CDK6 inhibitors abemaciclib, palbociclib, ribociclib, and trilaciclib have been already approved for breast cancer and can potentially be repurposed to treat critically ill COVID-19 patients.
Our procedure worked as follows. We matched reported critically ill COVID-19 cases from the UK Biobank with healthy controls and checked for significant differences between cases and controls in 64 candidate predictive traits. Cases showed significant differences in alanine aminotransferase, BMI, C-reactive protein, and neutrophil cell count. These measures were taken from the individuals concerned years before infection. We hypothesize that the genetic drivers of a disease associated trait indicate a good drug target to treat the disease. In the literature we found the genes IL-6, encoding interleukin 6, as a driver of C-reactive protein, and CDK6, encoding cyclin-dependent kinase 6, as a driver of BMI and neutrophil cell count. Three drugs have been already approved that inhibit interleukin 6 either directly (siltuximab for Castleman's Disease) or indirectly via the interleukin 6 receptor (tocilizumab and sarilumab for rheumatoid arthritis). The four drugs abemaciclib, palbociclib, ribociclib, and trilaciclib are already approved for treatment of breast cancer target CDK4/6. 8 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Regression models that tested all traits together showed that C-reactive protein and neutrophil cell count explained independent variance in critically ill COVID-19 status in the presence of other traits such as BMI. Propensity score stratification found evidence of a causal relationship between both C-reactive protein and neutrophil cell count and the prevalence of critically ill COVID-19 status.
It is important to note that Mendelian randomization results did not confirm a causal role for either C-reactive protein or neutrophil cell count. However, MR is typically used where there is a direct relationship between gene and outcome. In our case we are looking for a relationship that is mediated by a viral infection, adding a great deal more noise. This, compounded by the fact that MR needs a larger sample size 24 than we have available, might account for our not finding evidence of a relationship in this analysis.
The role of C-reactive protein in COVID-19 can be explained by the previously-reported disease mechanism. The phenotype of critically ill COVID-19 status completely differs from mild or moderate disease, even among hospitalized cases, by an uncontrolled overreaction of the host's immune system. [1][2][3] The most prominent difference between critically ill and moderate COVID-19 cases is the response to immunosuppressive therapy. In patients without respiratory failure, there is a trend indicating that treatment with corticosteroid dexamethasone is harmful, whereas among patients with critical respiratory failure, it has substantial benefit. 25 Therefore, immunomodulatory therapies, such as interleukin 6 receptor (IL-6R) antagonists tocilizumab and sarilumab have been successfully tested to block the immune system's overreaction in the form of a cytokine storm in critically ill COVID-19 patients. 26 9 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The role of neutrophil count in the disease mechanism can also be explained.
Neutrophils are white blood cells and an important component of our host defense against invading pathogens. Critically ill COVID-19 status is characterized by infiltration of the lungs with macrophages and neutrophils that cause diffuse lung alveolar damage, the histological equivalent to ARDS. [27][28][29] Neutrophils develop a sophisticated network of DNA called neutrophil extracellular traps (NETs) through NETosis, a liberation of web-like structures of nucleic acids wrapped with histones that detain viral particles. 30 However, ineffective clearance and regulation of NETs result in pathological effects such as thromboinflammation as described above. 31 On the one hand, NETs are essential and efficient for trapping the virus. And on the other, they cause damage to the organism by triggering highly intense immunological and inflammatory processes. CDK4/6 have been previously described as regulators of NETosis. CDK4/6 inhibitors abemaciclib and palbociclib block NET formation in a dose-responsive manner but do not inhibit the oxidative burst, phagocytosis, or degranulation, indicating that CDK4/6 inhibition specifically affects NET production, rather than generally modulating inflammatory pathways as IL-6 inhibitors do. 32 There is good reason to think that IL-6 and CDK4/6 might both be good drug targets for the treatment of COVID-19. However, given that neutrophils have a direct role in the thromboinflammatory process in critically ill COVID-19 patients, we believe that blocking neutrophils represents a more selective strategy than suppressing the immune system as a whole. There are two phases of COVID-19 infection (Fig. 3). In the beginning of infection, the virus enters the cell and starts viral replication. Here, vaccines and monoclonal antibodies block the viral entry. Nonetheless, after entry into the cell via the ACE2 receptor, currently, no therapeutic option for intervention exists. After viral infection, first immune reactions are observed such as a decrease 10 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 28, 2021. ; https://doi.org/10.1101/2021.05.18.21256584 doi: medRxiv preprint of lymphocyte cell count (lymphopenia). This phase is called the viral response phase. In the beginning of the next phase, the so-called host response phase, an overreaction of the host immune system occurs. The detailed mechanism leading to the overreaction of some patients' host immune systems is still unknown. From our results here we assume that already manifested inflammations indicated by high C-reactive protein levels and high neutrophil cell count trigger the immune system's overreaction resulting in thrombotic inflammation in the lungs and, from there, respiratory failure. In summary, the host response decides whether an infectious disease like COVID-19 has a mild course or leads to respiratory failure.
The cytokine IL-6 plays a central role in host response. On one hand, IL-6 binds to liver cells inducing the release of C-reactive protein that binds to phosphocholine of dead cells and recruits phagocytes. On the other hand, IL-6 stimulates the production of neutrophils and, thus, indirectly induces NETosis. Therefore, it is reasonable to inhibit IL-6 in a therapeutic intervention. However, immunomodulators can only be administered to tackle the overreaction of the immune system. Immunomodulators given in the early infection phase are harmful for patients. In contrast to IL-6 inhibitors, CDK4/6 inhibitors selectively block the NET formation and have no impact on other important host immune reactions such as phagocytosis.
Consequently, they can be given earlier in the infection than IL-6 inhibitors, thus filling the therapeutic gap between vaccines and monoclonal antibodies in early infection and immunomodulators in the late stage. Our statistical analyses identified an independent effect of neutrophil cell count on critically ill COVID-19 status, providing evidence that therapeutic intervention at this later stage is still effective. We therefore hypothesize that CDK4/6 inhibitors are superior to IL-6 inhibitors in the treatment of critically ill COVID-19 patients. This is supported by the reported cases 11 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 28, 2021. ; https://doi.org/10.1101/2021.05.18.21256584 doi: medRxiv preprint of breast cancer patients and their disease course during CDK4/6 therapy, such as the report of Grinshpun et al. 33 that a breast cancer patient on CDK4/6 inhibitor therapy had a unique disease course, halting the full presentation of the disease.
Once the drug was withdrawn, the full classic spectrum of illness appeared, including a bothering desaturation necessitating a prolonged hospital stay for close monitoring of the need for invasive ventilations. 33 To conclude -we propose CDK6 as a new and plausible drug target and the repurposing of already-approved breast cancer drugs abemaciclib, ribociclib, trilaciclib, and palbociclib as a possible treatment against critically ill COVID-19 status. CDK4/6 inhibitors have the advantage of targeting the thromboinflammation earlier and more selectively than reported IL-6 inhibitors. Additionally, CDK4/6 inhibitors are chemical compounds and are therefore easier to store and administer than monoclonal antibodies. Clinical evidence for CDK6 inhibitors in treating critically ill COVID-19 patients has been already reported. Further clinical investigations are ongoing. We have also presented a novel methodology to find and test drug target candidates for drug repurposing in population data. Our approach is not limited to drug repurposing, but can also be used to validate new drug targets. This highlights the importance of biobanks for global health in a pandemic.

Recruitment of cases and controls
The COVID-19 phenotype was defined based on the rich information made available by the UK Biobank project, which has been collecting COVID-19 outcomes for their large cohort of patients. The COVID-19 outcomes up until 28 th October 2020 were collected, and cases were defined as reported previously. 8 Briefly, severe cases were 12 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 28, 2021. ; https://doi.org/10.1101/2021.05.18.21256584 doi: medRxiv preprint defined as patients who died or were hospitalized (cause of death or diagnosis contains an ICD10 code for COVID-19 U071) or were on a ventilator (operation code contains an E85*). Cases were not filtered based on COVID-19 test outcomes. The remaining individuals from the UK Biobank were defined as potential controls.
Subsequently, patients of European ancestry were selected, and cases and controls for deaths). The remaining individuals from the UK Biobank were defined as potential controls. European ancestry was carried over and selected based on UK Biobank phenotype as well as covariates information (age and sex) for the resulting dataset.
Covariate distributions were matched to result in the same number of cases and controls. Variants reported by Pairo-Castineira et al. 8 and Ellinghaus et al. 7 as well as variants reported by the ClinVar database 34 for the genes reported by the papers were included in the dataset.

GWAS
The UK Biobank genotypes for the cases and controls were extracted to create a dataset that was then submitted to a series of quality control steps with an aim to 13 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 28, 2021. ; https://doi.org/10.1101/2021.05.18.21256584 doi: medRxiv preprint remove biases in the downstream analysis as described in Marees et al. 35 First we filtered SNPs and individuals based on their missingness in the dataset. This excludes SNPs that have a high proportion of subjects where genotyping information is unavailable or of poor quality. Similarly, individuals where a large proportion of SNPs could not be measured were excluded. This was achieved in two steps, where first a lenient threshold of 0.2 (i.e. > 20%) was applied to remove the clear outliers, followed by a more stringent threshold of 0.02 (i.e. > 2%). SNP filtering was performed before individual filtering. Next, all variants not on autosomal chromosomes were removed. Next, variants that deviate from Hardy-Weinberg equilibrium were removed in a two-step process whereby we first applied a lenient threshold of 1e-6, followed by a more stringent threshold of 1e-10. This is a common indicator of genotyping errors. Thereafter, individuals were filtered out based on their heterozygosity rates which can indicate sample contamination. Individuals deviating by more than 3 standard deviations from the mean of the rate from all samples were filtered out. To assess the heterozygosity rate per sample, those variants that were in linkage disequilibrium with each other were extracted by scanning the genome at a window size of 50 variants, a step size of 5, and a pairwise correlation threshold of 0.2. Next, related individuals were removed. To achieve this, their identity by descent coefficients (IBD) were calculated and only one individual per related cluster was kept. Then, the small proportion of missing genotypes were imputed and additional variants reported by Pairo-Castineira et al. 8 and Ellinghaus et al. 7 as well as variants reported by ClinVar database 34 were included in the dataset from the UK Biobank imputed variants. This yielded a dataset with a total number of 335,332 quality controlled variants. Finally, the population structure of the samples was analyzed in two stages to identify internal stratifications, which was used to filter out any 14 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Regression modeling
Logistic regression models were fitted using the glm function in R (www.R-project.org). A drop one model comparison procedure was performed in order to determine whether each of a set of traits accounts for unique variance in critically ill COVID-19 disease status.

Propensity score analysis
Using the method of Imai and Van Dyk 21 , we regressed neutrophil count on BMI, age, and gender and then used the resulting model predictively in order to generate propensity values for our population. We then stratified the population into propensity deciles (the members of which are approximately matched in terms of their propensity), estimated the effects of z-transformed neutrophil count on disease status within each band using logistic regression, and calculated a weighted average.

Mendelian randomization
We used independent GWAS summary data for neutrophil cell count (exposure) published by Vuckovic et al. 23 (GCST90002398 downloaded January 15th 2021) and summary data for critically ill COVID-19 status (outcome) published by the COVID-19 Host Genetics Initiative (https://www.covid19hg.org/results -COVID19hg 15 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  by Mann-Whitney U-test. In these measures, taken years prior to infection, cases showed significant differences in the characteristics in various traits that have been later described as phenotypes associated with critical illness due to COVID-19. For instance, cases had higher body mass index (BMI), higher reticulocyte cell count, higher inflammatory markers such as alanine aminotransferase, C-reactive protein, cystatin C, neutrophil cell count, and higher glycated hemoglobin (HbA1c), but lower HDL and LDL cholesterol as well as lower vitamin D levels than healthy controls. 21 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   22 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   Therefore, it is reasonable to inhibit IL-6 in a therapeutic intervention. However, while 23 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. immunomodulators could be administered to tackle the overreaction of the immune system, they are harmful for patients in the early infection phase. In contrast to IL-6 inhibitors, CDK4/6 inhibitors selectively block the NET formation and have no impact on other important immune reactions such as phagocytosis. Consequently, they can be given earlier in the infection than IL-6 inhibitors, filling the therapeutic gap between vaccines and monoclonal antibodies in early infection and immunomodulators in the later stage.
24 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted May 28, 2021. ; https://doi.org/10.1101/2021.05.18.21256584 doi: medRxiv preprint Table 2 Tab. 2. Propensity score matching was employed in order to determine whether a causal relationship exists between neutrophil cell count and the prevalence of (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.