PNPLA3 and TLL-1 polymorphisms affect disease severity in patients with COVID-19

Stefania Grimaudo (  Stefania.grimaudo@unipa.it ) University of Palermo Emanuele Amodio Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Rosaria Maria Pipitone Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Carmelo Massimo Maida Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Stefano Pizzo Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Tullio Prestileo Infectious Diseases Unit & Centre for Migration and Health ARNAS, Ospedale Civico-Benfratelli Palermo (Italy) Fabio Tramuto Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Davide Sardina Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Francesco Vitale Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Alessandra Casuccio Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy Antonio Craxì Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties “G. D’Alessandro” University of Palermo – Italy


Introduction
More than 7 million cases of infection with Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been reported worldwide by June 8, 2020, with over 400,000 deaths due to the virus [1]. While the clinical spectrum is extremely broad, ranging from mild or asymptomatic cases to severe acute respiratory syndrome, it has become immediately apparent that the outcome of infection is strongly conditioned by host-related factors such as age, gender and pre-existent underlying illnesses. In the setting of a RNA virus with a relatively fast evolutionary rate [2] no clear evidence has instead emerged up to now for viral variability as a major determinant of pathogenicity. SARS-CoV-2 viral load, although linked to the phase of infection, does not appear to be a major determinant of pathogenicity even if a precise quanti cation of viral load in samples from oropharyngeal swabs remains elusive [3].
Albeit the pathogenetic mechanism of coronavirus disease 2019 (COVID- 19) is yet partly unclear, the course and outcome of disease seem to be signi cantly in uenced by host factors supporting a strong proin ammatory response and inducing a massive release of cytokines leading to a "cytokine storm" ultimately causing severe alveolar damage but also multiorgan failure [4]. It is thus conceivable that the complexity of host genetic background in terms of polymorphisms in genes involved in SARS-CoV-2 receptor-dependent endocytosis, antiviral responses and modulation of cell infection and reinfection, in ammation, or immune stimulation may play a key role in pathogenesis and outcome of COVID-19. Family clustering of severe cases, reported since the rst phase of the pandemic, would support the possibility of a genetic predisposition [5].
In order to probe the possible effects of host's genetic polymorphism in different segments of the innate antiviral response, we aimed to assess some speci c functional Single Nucleotide Polymorphisms (SNPs) of genes involved in control of viral infection by induction of in ammation (IFNL3/IFNL4), in macrophage polarization (MERTK), in tissue and systemic in ammation (PNPLA3).
These SNPs were investigated on DNA samples directly derived from oropharyngeal swabs collected from a cohort of SARS-CoV-2 patients, resident of Sicily, Southern Italy, at the onset of the pandemic phase. In addition, we evaluated a SNP of Tolloid Like-1 (TLL-1), a secreted protease capable of activating complement through the C1q pathway and also potentially able to activate the Spike protein of SARS-CoV-2.

Materials And Methods
Our observational study was carried out from February 24, 2020 to April 8, 2020 and included all consecutive patients (N = 383) with laboratory-con rmed SARS-CoV-2 infection, whose oropharyngeal or nasopharyngeal swabs had been sent to the referral Laboratory for COVID-19 Surveillance for Western Sicily located at University Hospital "P. Giaccone" of Palermo. Laboratory con rmation for SARS-CoV-2 was de ned as a positive result of reverse transcriptase real-time polymerase chain reaction (rtReal-Time PCR) of nasal, pharyngeal or nasopharyngeal swabs according to the Centers for Disease Control and Prevention protocol [6].
For each patient, sociodemographic variables (age, sex, and residency) were collected at the baseline, whereas clinical outcomes [home isolation, hospitalization, admission to intensive care unit (ICU), and death] were obtained by consulting the clinical pro les centrally provided by the Italian National Institute of Health (Istituto Superiore di Sanità, ISS) and, when available, by direct contact with the Hospitals involved in the care of each recruited patient. Each patient was monitored for at least 21 days after recruitment and the nal day of follow-up was April 8, 2020. Due to the observational design, linked to referral, no follow up biological samples were available. Before the swab sampling, an individual informed consent has been obtained by the health care provider. An approval to conduct the study has been required and obtained from the Ethical Commitee of the A. O. U. P. "P. Giaccone" of Palermo, Italy. The research reported in this paper is in accordance with the World Medical Association Declaration of Helsinki on Ethical Foster City, CA, USA) using commercial (MERTK rs4374383, PNPLA3 rs738409, TLL-1 rs17047200) or custom (IFNL3 rs1297860, INFL4 rs368234815) genotyping assays (Thermo Fisher Scienti c). Complete genotyping was not possible for all patients due to the scarce amount DNA available from swabs. The genotyping call was done by 2.3 Applied Biosystem Software. Genotyping was conducted in a blinded fashion relative to patient characteristics. Before testing for SNPs, samples were anonymized and a unique randomly generated identi cation code was assigned to each record and to the correspondent swab. Researchers performing genetic analyses were unable to identify patients at all stages, and no permanent record linking these data to patient IDs was produced.

Statistical analysis
Statistical analyses were carried out by researchers not involved in the dataset storage and management. Calculation of the sample size was not performed a priori, the ultimate size being equal to the number of patients recruited during the entire study period.
Continuous variables are presented as median and interquartile range (IQR) and categorical variables are expressed as number of patients (percentage).
For the purpose of most analyses, especially in relation to SNPs, patients were categorized into two main groups: mild disease (including those left isolated at home isolation and those hospitalized without complications) and severe disease (patients hospitalized for intensive/critical care and patients who died during the observation period, regardless of initial allocation).
Due to the presence of some missing data, the distribution of data over the age subgroups is based on the data available for each variable, while the remaining percentages are calculated using the number of data available for that subgroup. Univariate analysis was employed to identify variables associated with development of severe disease. Mann-Whitney rank sum test or ANOVA test were used to compare non-parametric continuous variables between age subgroups and patients with or without severe disease.
Chi-square or Fisher exact tests were used for categorical variables as appropriate. Multivariable logistic models were built to determine the association between potential confounders (age and sex) and the investigated genotypes. Each multivariable model included all patients in a rst phase and only patients aged 65 years or less in a second phase. Due to the low frequency of some host's genotypes, the multivariable models included only one genotype at time. All statistical tests were two-tailed, and statistical signi cance was de ned as P ≤ .05. The analyses have not been adjusted for interaction, and given the possibility of type II error, the ndings should be interpreted as exploratory and descriptive. Analyses were performed using R Software analysis 3.6.1 [7].

Bioinformatic analysis
For TLL-1, prediction of speci c protease cleavage site on the target substrate was performed by SitePrediction [8]. Top 20 predictions ranked by average score are reported. Aminoacid sequence in FASTA format of the target substrate Spike protein was retrieved from Uniprot (accession: P0DTC2) that contained cleavage site speci city for members of the M12.016 sub-family [9].

Results
Main features of 383 COVID-19 patients included in the study are summarized in table 1. Overall, patient M/F ratio was 1.18 and median age was 58 years (IQR = 44-74 years), with a high percentage of subjects aged 41 to 64 years (39.95%). A total of 148 (38.64%) patients were hospitalized and 32 (8.36%) died during the follow-up period. Overall 330 patients were classi ed as mild disease and 53 as severe disease.
In the whole COVID-19 cohort the distribution of the genotypes of IFNL3, IFNL4, MERTK and PNPLA was in accord to the Hardy-Weinberg equilibrium, while the allelic distribution of the TLL-1 variant rs17047200 (A>T) (table 3) showed a statistically signi cant divergence from Hardy-Weinberg since the number of TT homozygotes observed was higher (11) than expected (6.3).
Assessment of risk factors associated to a severe outcome is reported in table 4. Male subjects and older patients were signi cantly at higher risk for a severe outcome (p = 0.02 and p<0.001, respectively). In the entire cohort, none of the host's SNPs was associated with COVID-19 severity of disease. When considering only patients aged 65 years or less, two genotypes were found to be signi cantly associated to an increased risk of severe outcome: GG for PNPLA3 rs738409 (p = 0.035) and TT for TLL-1 rs17047200 (p = 0.029), respectively.
These associations were con rmed by the multivariable logistic regression analyses performed on patients aged 65 years or less.
In silico analysis showed that there are at least 20 cleavage sites on the Spike protein substrate for the TLL-1 protease activity, con rming that TLL-1 is potentially involved in the Spike protein cleavage (Table 6).

Discussion
We have used, for the rst time to our knowledge in the eld of SARS-CoV-2, nucleic acid extracts generated from swabs during the diagnostic processing COVID-19 to evaluate the host's genetic pro le. This approach, originally devised for other respiratory viruses, had suggested that the IFN lambda system could be a determinant of the outcome of such infections [11]. Albeit we could not nd any signi cant relation between the IFN lambda system and the outcome of COVID-19, the perspective of using genetic material obtained from swabs could be of major relevance for wide, population-based studies of the genetic background of people infected by SARS-CoV-2. In the unresolved pathogenetic scenery of COVID-19, the individuation of genetic variants associated with more prolonged course or with a severe outcome of infection, would support the development of predictive tools useful to stratify subjects by risk class at presentation. Moreover, the individuation of key genes could contribute to a better understanding of the pathways involved in the pathogenesis, giving the basis for rational therapeutic approaches.
As already widely reported, old age and, to a lesser degree, male sex were major determinants in the prognosis of COVID-19 also in our cohort [12]. Due to unreliability of data sources, we cannot comment on the role of comorbidities in this group. Most comorbidities in the general Italian population are however likely to be prevalent in the last decades of life [13]. Generally speaking, subjects beyond 65 years of age had a seven-fold risk of developing a severe outcome of COVID-19 than their younger counterpart.
Moreover, in older patients the presence of comorbidities could represent a confounding factor in identifying other risk factors, as genetic polymorphisms, with weaker associations.
In this setting, the role of other predisposing factors to disease severity, including host's genetic, is likely to be offset. By converse, in the younger age group, where demographic variables and comorbidities are less prominent, some of the explored host's polymorphisms of genes linked to innate in ammatory response (PNPLA3) and proteolytic activities (TLL-1) were signi cantly associated to the worst outcome of COVID-19.
Although no precise pathogenetic pathway can be de ned by these observations, some issues deserve considerations. Viral infections are detected by the host innate immune system using pattern recognition receptors (PRRs) activated by pathogenassociated molecular patterns (PAMPs) leading to interferon (IFN) signalling induction. Type III IFNs or lambda IFNs use a heterodimeric receptor (IFNLR1-IL10R2) mainly expressed on the epithelial cells. As proof of the key role type III IFNs in the regulation of immunity response, single nucleotide polymorphisms in genes IFNLs were strongly associated with outcomes to viral infection [14]. It has been reported that the homozygosity for IFNL3 (rs12979860) and IFNL4 (rs368234815) variants, overrepresented in African descent, is associated with a reduction of viral clearance in children affected by acute respiratory infections sustained by Rhinovirus and Coronavirus [11]. Since subjects with rs12979860 CC and rs368234815 TThaplotype were reported to be more effective in clearing RNA viruses, possibly due to an up-regulation of in ammatory pathways, we aimed to assess whether the outcome of SARS-CoV-2 infection is conditioned by these polymorphisms. Our results suggest that the polymorphic status of IFNL3/IFNL4 does not affect the rate of infection, since the genotypes are fully in Hardy-Weinberg equilibrium, nor the likelihood of a severe outcome of COVID-19. Hence the suggestion about using IFNLs as an antiviral in COVID-19 patients or in subjects at high risk of infection, currently in clinical trials with peg-IFN L1, would not be supported [15].
COVID-19 pneumonia is characterized by in ammatory exudation of monocytes and lymphocytes. Lung tissues show an abnormal accumulation of CD4+ helper T lymphocytes and CD163+ M2 macrophages recruited by type II pneumocytes in the alveolar spaces. The immunohistochemical evidences, showing strong positivity and site-speci c expression, suggest that M2 macrophages play a key role in COVID-19 pathogenesis [16]. Mer tyrosine kinase (MERTK) is a major macrophage receptor involved in the clearance of apoptotic cells expressed principally in the subpopulation of M2 macrophages [17]. The polymorphic status of the MERTK gene is able to in uence its expression, conditioning M2 polarization of resident macrophages. Genotyping of our patients show that the polymorphic assessment for MERTK does not affect neither the rate of infection, the genotypes being in Hardy-Weinberg equilibrium, nor the likelihood of a severe outcome of COVID-19.
The patatin-like phospholipase domain-containing 3 (PNPLA3) is a triacylglycerol lipase, which mediates triacylglycerol hydrolysis in adipocytes. The PNPLA3 missense variant rs738409Ile148Met) (C>G), causing loss of function, is associated with hyperexpression of the NLRP3 in ammosome, leading to increased serum levels of IL-1β and IL18 [18]. It is known that many viruses and among them SARS-CoV-2, are directly able to induce activation of the NRLP3 in ammosome leading to the cytokines storm probably causing most fatal outcomes [19]. In our subcohort of patients ≤65 years, the GG PNPLA3 genotype was signi cantly associated to an increased risk of severe outcome. It is thus conceivable that subjects carrying the GG genotype for rs 738409, having a constitutive upregulation of the NLRP3 in ammosome, develop more severe tissue damage when infected by SARS-CoV-2.
Tolloid Like 1 (TLL-1), a gene encoding an astacin-like, zinc-dependent, metalloprotease that belongs to the peptidase M12A family, may have a role in modulating the clinical expression of COVID-19. TLL-1 catalytic domain shows a relative promiscuity and many TLL-1 substrates are known.
The spike protein (S), a trimeric transmembrane protein of the SARS-CoV-2, after cleavage of the ectodomain, is essential for viral binding and entry into the host cells using the receptor angiotensin-converting enzyme 2 (ACE2). The cleavage of S into subunits represents the fundamental step for viral entry in uninfected cells, and SARS-CoV-2 has developed several strategies for proteolytic activation using a large number of host proteases. Among them, furin, trypsin, trans-membrane protease/serine (TMPRSS) which cleave S in Golgi apparatus or during virus endosomal uptake, and cathepsins which cleave S during virus entry [20]. A role of the interaction between complement defence collagens C1q and mannose-binding lectin with TLL-1 in triggering the activation of complement during in ammation and tissue repair has also been described [21]. Among our younger patients, homozygosity for the TT genotype was signi cantly associated to an increased risk of severe outcome, at a lesser strength than PNPLA3.
Albeit these data suggest that SNPs for PNPLA3 and for TLL-1 may modulate the course of SARS-CoV-2 infection, we must acknowledge some limitations of our study. The enrolment cohort was limited in sample size, thus possibly preventing some of the uncommon SNPs to reach signi cance due to the small number of patients with each variant. Along the same line, the relatively low number of unfavourable outcomes may have curtailed the signi cance of some allelic variants. The modality of accrual of the cohort through a referral laboratory may have originated some selection and information retrieval biases, and an external validation group has not been tested. Last but not least, a whole genome sequencing approach, rather than a spot evaluation of some individual parameters, would have originated more information. The latter approach, although desirable, was made impossible by the lockdown phase at the time when the study was performed at our Institution.
Patients in our cohort were collected during the early phase of local spreading of the infection, at a time when pre-existing immunity in the general population can be estimated to be non-existent. In this group, the distribution of the low frequency GG genotype distribution of PNPLA3 rs738409 was in accord to the Hardy-Weinberg equilibrium. When assessing whether the polymorphic status of rs17047200 (A>T) of the TLL-1 gene is related to the outcome of COVID-19 patients, we found that the allelic distribution in our cohort for this SNP was not entirely consistent with the Hardy-Weinberg equilibrium, suggesting a possible predisposing role to SARS-CoV-2 infection. Over the last weeks, there has been a diffuse, albeit yet unsubstantiated, feeling that the clinical expression of COVID-19 has become less aggressive, and that new cases occurring in areas where SARS-CoV-2 is still actively spreading are less severe. It is unclear whether this may be due to an attenuation of viral pathogenicity, to environmental factors or to population characteristics. Whilst the current low number of deaths due to COVID-19 in our region cannot cause a reduction in the number of subjects with the "unfavourable" genotype, a further unfavourable evolution of the pandemics would possibly cause a reduction in the number of subjects carrying it.
In conclusion, we feel that polymorphisms of the host's genetic determinants, and especially those related to the innate in ammatory response to SARS-CoV-2, should be carefully assessed by a wide-ranging approach, aiming to develop gene pro ling tools in order to support an early prediction at the individual level in the course of COVID-19. If host's polymorphisms are con rmed as determinants of severity, new strategies for identifying vulnerable populations or patients at higher risk for severe disease could be implemented and promoted, improving diagnosis, treatment and prognosis of COVID-19.   Table 6. Top 20 cleavage sites predicted with SitePrediction ranked by average score. The aminoacid sequence of Spike protein from Uniprot was used as substrate while 17 known cleavage sites for TLL1 were retrieved from MEROPS. Each row represents a cleavage site within Spike protein together with its position and site.