An ancient motif unique for human STING, RGS12 and SARS-CoV-2 spike proteins

to the beta-lactamase inhibitors. Our results show the importance and presence of

COVID-19 pandemic is caused by the beta-coronavirus named as severe acute respiratory syndrome coronavirus (SARS-CoV-2) 1-3 . The number of cases are over seventy millions and the deaths are more than one and a half million 9 . The COVID-19 is described as unknown pneumonia with gastrointestinal, cardiovascular, immunological and neurological complications. The most serious complication of COVID-19 is hypoxemia due to the respiratory failure and many patients die from acute respiratory distress syndrome (ARDS) 1-3 . Venous and arterial thrombosis is very common in COVID-19 playing role on multisystem organ dysfunction. Thrombotic abnormalities and cardiovascular complications lead to ischemic stroke, myocardial infarction and venous thromboembolism playing role on multisystem organ dysfunction in COVID-19 10 . The pathophysiology is not fully defined but the spike protein of the virus plays role for entering the host cells 1,3, 11 . There is no proven treatment for the COVID-19 12 . The angiotensin converting enzyme type 2 (ACE2) was found as the main receptor for the spike protein 1, 11 . ACE2 is absent on T cells 13 and the interaction between ACE2 and the spike protein is not sufficient for explaining the mechanism(s) of coagulation and cytokine storms of the COVID-19.
GxxxG motifs are one of the small-xxx-small short linear motifs which have important role on the protein-protein interactions 5,6 including virus proteins 14 . These motifs play role on the molecular mimicry and evolutionary arms race 15 . There are controversial results on the role of the GxxxG motif for the spike protein of the coronavirus 16,17 . The aim of our study was to investigate the GxxxG (GG4) motifs on the spike protein of the SARS-CoV-2 using bioinformatical methods.

#1. Unique and evolutionary conserved motif
There were more than one GG4 motifs on the spike protein but one of them was rich in aromatic amino acids. The first amino acid of the motif was alanine but not glycine. We named this smallxxx-small motif as "semi-GG4 motif" which showed motif similarity with the stimulator of interferon genes (STING) proteins (Fig. 1). A molecular mimicry is present for the STING proteins  (Fig. 1). The presence of this motif on the SPIKE proteins of many species including Nematostella vectensis showed us the motif is evolutionary conserved on STING proteins since Cnidarians (Figs. Fig. 1). We were surprised to see that this motif mimicry is unique for the SARS-CoV-2 and STING proteins (Fig. 1) because our search for this motif on the UniProt surprisingly showed that it is not found on any other protein.

#2. The motif and other coronaviruses
A similar aromatic amino acid-rich semi-GG4 was found on the spike proteins of all coronaviruses with a common motif: Fig. 1).

#3. The spike group of the 21 st century
Cluster analysis of the spike proteins of coronaviruses isolated from humans showed us that they can be classified as 3 subgroups. We named them as first (1s), second (2s) and third spike group (3s). There are only three members of the third spike group (3s): MERS, SARS (SARS-CoV) and SARS2 (SARS-CoV-2) which are the causes of the serious infections of the 21 st century (21 st century group) (Fig. 2). The Venn diagram showed us that these pathogens of the 21 st century make up a separate group of the human beta-coronavirus spike proteins (Fig. 3).
The clinical differences between MERS, SARS-CoV and SARS-CoV-2 are known 18 and a similar difference between the small-xxx-small motifs of these three spike proteins can be described in terms of their aromatic amino acids. There is only one aromatic amino acid which is phenylalanine (Phe) on the motif of MERS and three aromatic amino acids (2 Tyr + 1 Phe) on SARS-CoV but the three aromatic amino acids on SARS-CoV-2 are all tyrosine (Tyr) (Fig. 2).

#4. Tyrosine
Tyrosine is important on the structure and function of proteins 19,20 . The amount of aromatic amino acids and Tyr content of the semi-GG4 motif of the spike proteins of the 3s subgroup is parallel with their pathogenic potentials: Number of aromatic amino acids as Tyr are highest on motif of the SARS-CoV-2 ( Fig. 2) which is more contagious than others 3 . The amino acid numbers of the unique semi-GG4 motif are 264-270 on the N-terminal domain (NTD) of the spike protein (accession number: P0DTC2) (Figs. 4a,b). NTD was reported as another binding region of SARS-CoV-2 [21][22][23] . The unique motif is also the binding site of the endogenous ligand, cyclic-di-GMP of the STING protein 24 .

#6. GYL triplet
The GYL triplet of the unique motif is found on the spike proteins of SARS-CoV and SARS-CoV-2 but not on MERS (Fig. 2) suggesting a possible role of the GYL triplet on the different pathogenic properties of MERS, SARS-CoV and SARS-CoV-2. IGY motif was also reported on the secreted toxic proteins of fungi 25 which is found inside the unique motif indicating the role of evolutionary mechanisms (Fig. 3).

#7. Aromatic cage
Tyrosine as the 266 th amino acid (Y 266 ) is found only on the SARS-CoV-2 but not on SARS-CoV and MERS (Figs. 2, 4b). There is hydrophobic contact between Y 266 and W 64 and they make an aromatic cage. Y 266 is attached to the R 214 and A 93 of the neighbouring beta-sheets (Fig. 4c) contributing to a more stabilized structure as reported for some other proteins 26,27 . There is another aromatic cage just nearby the Y 266 (Fig. 4d) showing that the unique motif is found in an aromatic cage-rich area. This structure is found only on the spike of SARS-CoV-2 but not on other members of the 3s group because they (SARS-CoV and MERS) do not have Y 266 and W 64 amino acids (Figs. 2, 5c). The unique motif is rich in aromatic amino acids and aromatic cages (Figs. 4c,d). Aromatic cages usually capture positively charged molecules and amino acids like lysine 26 but there is no data on this aromatic cage of the spike protein and we do not know the kind of role on the virus-host relationships.

#8. STING protein
Free DNA in the cytoplasm is abnormal and it starts the STING signaling. Intracellular genomic structures including viruses are sensed by the cyclic GMP synthase (cGAS) producing cyclic dinucleotides like c-di-GMP which activate STING proteins 28 . Activated STING is important in autophagy 29 , cytokine release 1 , coagulation 30 , obesity 31,32 and old age 33 . These are among the symptoms of COVID-19, which are all effected by the STING proteins. The unique motif is the cdi-GMP binding site on the STING protein 24 showing the presence of a molecular mimicry enabling a direct interaction between the STING and the spike proteins of SARS-CoV-2 (Figs. 1,2). The cdi-GMP binding site plays role on the direct interaction between STING and the spike proteins, as a different mechanism from the cyclic guanosine monophosphate-adenosine monophosphate synthase-stimulator of interferon genes (cGAS-STING) pathway. This direct interaction, in addition to the cGAS-STING pathway, will result with hyperactivation of the STING proteins. STING activation plays role on vascular and pulmonary pathologies 30 and it is a major player for the induction of neutrophil extracellular traps 34 contributing to the immunothrombosis 35 . Activated

#9. RGS12
We found another small-xxx-small (semi-GG4) motif for the spike protein of SARS-CoV-2 and regulator of G protein signaling 12 (RGS12) proteins. It is similar, but not identical to the unique motif for the STING proteins which can be written as: We searched for this motif and found that it is also unique and found only on the RGS12 and spike protein of SARS-CoV-2. RGS12 was recently reported to play a key role on inflammatory reactions 7 suggesting a significant contribution to the pathogenesis of COVID-19. Our results do not indicate any role for the STING and RGS12 proteins on pain, anosmia, ageusia, sex differences or the impact of air pollution on the COVID-19.

#10. TRPM ion channels
A surprising motif similarity between the spike protein of SARS-CoV-2 and a group of TRPM ion channels (TRPM1-TRPM4) is another example of molecular mimicry which we did not investigate further because it was very different than the unique motif (Extended Data Fig. 2).

#11. Mycobacterium tuberculosis
RGS12 and STING proteins are not specific for the respiratory system but the main pathogenic co-infections are reported which may converge in a "perfect storm" 39 . This motif similarity and the molecular mimicry may help us understand the interaction between tuberculosis and COVID-19.

#12. C1QT4
It was also very surprising for us to find a motif very similar (but not identical) to the unique motif for the beta-lactamase enzymes of M. tuberculosis and the STING proteins (Extended Data Fig. 4) and also for C1q tumor necrosis factor-related protein 4 (FASTA name is C1QT4) (Extended Data   Fig. 4a). The motif similarity between the M. tuberculosis beta-lactamase and C1QT4 was high compared to the STING proteins (Extended Data Fig. 4b). High levels of IL-6 is one of the severity predictors in COVID-19 40 . C1QT4 is one the major IL-6 elevating mechanisms and plays role on viral infections 41 indicating the role of C1QT4 on COVID-19 and supporting our results which was not reported for COVID-19.

#13. Archaea and evolution
It was interesting to find the same unique motif similarity between the spike proteins, ribosomal protein of Methanosprillum hungatei and membrane protein of Methanococcus maripaludis (Extended Data Fig. 4c). These prokaryotes are members of anaerobic methanogen Archaea

#14. Hub motif
A small motif and a molecular mimicry enabling interactions of so many proteins, most if not all, are involved in inflammatory reactions shows that the motif is short (only 7 amino acids) but not functionally so simple. If the sequence of the motif is [AS]YY[FIV]GYL, it is unique for STINGS and the spike protein of the SARS-CoV-2 but the motif [AS]xxxGYL is found on many proteins including beta-lactamase, C1QT4, RGS12 and on the proteins of Archaea suggesting that the motif is a member of "hub motifs" 44 with many other features awaiting to be discovered.
The motif is an evolutionary conserved "hub motif" and possibly the STING protein is a "hub protein" 44 .

#15. Beta-lactamase
There was a second motif for the beta-lactamase and the spike of SARS-CoV-2. This second motif similarity between beta-lactamase and the spike was adjacent to the unique motif (Extended Data Fig. 4d) suggesting an unusual interaction for beta-lactamase and the spike proteins. Based on this surprising molecular mimicry, we suggest that the classical beta-lactamase inhibitors are expected to inhibit some of the pathological effects of COVID-19. There is no proven drug for the COVID-19 45 and based on our results (Extended Data Fig. 4), beta-lactamase inhibitors are expected to be effective which may at least reduce IL-6 levels. Beta-lactamase inhibitors can be applied to the patients without any delay.

#17. Molecular mimicry, evolution, unique motif and beta-lactamase
Importance of the GxxxG motif was reported on the SARS-CoV-2 proteins 46 . Mimicry and molecular mimicry are among the methods of the evolutionary arms race 47-49 and mimicry was proposed as a mechanism to explain multi-organ damage in COVID-19 50 .
The aim of our study was not to investigate the interactions and roles of STING, RGS12, C1QT4 proteins or beta-lactamase enzymes on COVID-19, but the unique motif led us to these proteins and to our surprise, to the beta-lactamase inhibitors. Our results show the importance and presence of

Data Reporting
Cluster analysis were performed on the proteins for dendrogram and Venn diagrams. No other statistical methods were used in the study. All the protein sequences are deposited on the UniProt servers and the PDB ID of the spike protein of the SARS-CoV-2 used in the study is 6XEY found on the PDB servers.

Author Contributions
Both authors equally contributed to the study.

Declaration of interests
We declare no competing interests.

Additional information
Supplemantary information was given for the unshaded multiple comparison for all figures, as a single pdf file.
Correspondence and requests for materials should be addressed to S.A.      Dendrogram of the phylogenetic relationships of the STING and the spike proteins of human betacoronaviruses based on alignment of amino acid sequences generated by Clustal Omega and hierarchical cluster analysis using R. 'x' denotes any amino acid. The rst small amino acid in the smallxxx-small motifs are A, L or S and the other small amino acid is G making the semi-GG4 motif. 1s= First group, 2s= Second group, 3s= Third group. Members of the 3s are the cause of viral outbreaks of the 21st century and the ongoing pandemic of COVID-19.  The unique motif is, (A) on the NTD of the spike protein marked with stars, (B) located on one of the betasheets with a nger like loop extending outside, (C) there is a hydrogen bond between the R214 of the neighbour beta-sheet and Y266 which is found only on the SARS-CoV-2 member of the 3s group.Y266 makes an aromatic cage with the W64 which is found only on SARSCoV-2, (D) another aromatic cage around A93, the unique motif is surrounded with aromatic cages.