New Short RNA Motifs Potentially Relevant in the Severe Acute Respiratory Syndrome Coronavirus 2 Genome

As time passes, identifying new pharmacological targets is becoming more difficult. Shortly, it will be necessary to devise new strategies to tackle the problem. The coronavirus disease outbreak caused by the severe acute respiratory syndrome coronavirus 2 , represents a threat to human health serving as example from what we just said. The present study was aimed to collect a set of short RNA motifs with potential biological impact, most of which have not been observed heretofore. Categorizing RNA triplets by their gross-composition, the study collected 88 short RNA motifs, shared by most coronavirus genera independent on the percent identity between genomes. Selected motifs contain all nearest-neighbours of the triplets A, T, G and A, C, G. The high percent identity between severe acute respiratory syndrome coronavirus genomes makes it difficult these peptides to be found by current methods. The results provide 50 motifs in the 1a polyprotein-encoding orf, 27 in the 1b polyprotein-encoding orf, 5 in the spike-encoding orf and 6 in the nucleocapsid-encoding orf. They also provide insights about the validity of the procedure, confirming some motifs interspersed or attached to known relevant functional fragments of the genome, although most of them have not yet been associated to any known function. The high level of preservation of these motifs in most coronavirus genera suggest they might have potential to be used for diagnostic, in vaccines, or as substrate for protease inhibitors.


Introduction
Coronaviridae family contains enveloped, positive-sense, single-stranded RNA viruses. The coronavirus (CoV) taxonomy, by the Coronavirus Study Group of the International Committee for Taxonomy of Viruses (ICTV), classify CoVs into four genera: alpha, beta, gamma and delta [1][2][3] . Moreover, bat CoVs are likely the gene source of alpha and beta genera, and avian CoVs are likely the gene source of gamma and delta genera [3][4][5] .
In recent years there have been three zoonotic outbreaks of beta-CoVs: the severe acute respiratory syndrome 1 (SARS-CoV-1); the Middle East respiratory syndrome (MERS-CoV) and the severe acute respiratory syndrome 2 (SARS-CoV-2) 6 . The spread of the pandemic caused by the beta-SARS-CoV-2 is affecting all around the world 7,8 . After beta-SARS-CoV-2 genome isolate 9 , a study showed that genomes from both bats and pangolin species would be natural reservoirs of the SARS-CoV-2, highlighting the high identity between genomes of these species and the human SARS-CoV-2 10 . The high genome identity observed when different beta-SARS-CoV genomes are compared, make it difficult to outline a strategy for distinguish highly conserved short RNA motifs with possible biological interest. Consequently, an approximation to the issue independently on the percent identity between genomes would be desirable.
Categorizing NT triplets by their gross-composition, resulted to be a useful strategy when applied to issues regarding coevolution or gene clustering 11,12 ; an analysis between distant strains of picornavirus genomes, as of the foot-and-mouth disease virus and of the human rhinovirus, gave as result, some NT sequence correlations not observed heretofore 13 . Briefly, the strategy proposed works as follows: i) the genome is read fully-overlapping, taking into account the context of each NT of the genome. ii) each NT triplet is categorized by their gross composition; such categorizations, called triplet composons (or tCPs), generate a tCP sequence that can be compared with other tCP sequences. Comparing tCP sequences let the access to short RNA motifs tCP conserved; Short motifs are hard to be found mainly when the percent identity among genomes is very high. Short RNA motifs, here in collected, are clearly distinguished from the known short linear motifs or SLiMs 14 that are peptides mediating protein-protein interactions having a functional interaction interface encoded in a short and poorly conserved sequence 15 . The goal of the present study was to provide proof-of-concept for a rational approach to obtain short highly conserved RNA motifs potentially relevant.

Methods
Data availability. The CoV genomes were obtained from the GenBank®, https://www.ncbi.nlm.nih.gov/ 16 . Genomes set to be studied would be: human coronavirus (HCoV) 17 , porcine epidemic diarrhoea virus (PEDV) 18 , human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) 19 , human severe acute respiratory syndrome coronavirus (SARS-CoV-1) 20 , civet severe acute respiratory syndrome coronavirus (civet SARS-CoV) 21 , mouse hepatitis virus (MHV) 22 , bat severe acute respiratory syndrome coronavirus (bat SARS-CoV) 23 , avian infectious bronchitis virus (IBV) 24 and Common moorhen coronavirus 3 . CoVs data as the strain, the genus, the accession number and references are in Table 1. Abbreviations are taken from the International Committee on Taxonomy of Viruses (ICTV) https://talk.ictvonline.org/. In the paper, T is listed as a nucleotide. Since the viral genomes are RNA, U should be substituted for T. However, very often, RNA is reverse-transcribed into DNA first, and then the DNA is sequenced. This is the reason why GenBank present the single-stranded RNA viral genomes in this way. Moreover, NCBI support indicates that replacing U with T is a GenBank convention, which saves computational resources. Table 1. Genomes of different CoV genera Numerical analysis. Similarities and dissimilarities among genomes of different CoV genera were analysed by the tCP-method 12 . The justification of the method is based in exclusionary multiplet categorizations characterized by the presence or absence of particular bases. 14 different categorizations, or tCPs, were identified (Table 2), each one containing one or six NT triplets depending on whether they are non-degenerated or degenerated 11 . The RNA was read fullyoverlapping avoiding information loss and guaranteeing all triplets are considered in the study, taking into account, thus, the context of each NT in the RNA sequence.  The tCP distribution in CoV genomes is represented in cumulative tCP usage frequency graphs. The cumulative frequency is the sum of all previous tCP appearances along the sequence up to the current length. To simplify, the cumulative graph and its regression line, hereafter called 'tCP profile', were projected on the axis representing the genome length by subtracting the cumulative tCP usage frequency from the regression line 12 . The profile represents the distribution of differences between the tCP events observed, tCP0, and the tCP events estimated, tCPe. It will consider that two RNA sequences share a similar tCP profile when the Pearson correlation coefficient, r, is equal or higher than an arbitrarily cut-off. The election of the cut-off would be done on the basis of the p-value (significance level). It will consider for comparison, cut-offs for 0,7≤r<0,8 and r≥0,8. The Pearson correlation coefficient obtained are highly significant for both cut-offs due to the dependence of r on the sample size; in the analysis, the sample size is >2x10 4 points. For both cut-offs the r significance was p<0,001. Graphs and statistics were carried out with the package OriginPro 8 SRO V8.0724 (B724) ©OriginLab Corporation.
For assessing the applicability of the method, certain criteria must be fulfilled. i) The tCP identity near 100% was discarded when compared CoV genomes, because impedes looking for SLiMs. Thus, the number of shared tCPs must be significantly lower than 14. ii) Viral genomes must be orthologs, because correlation does not imply causation 25 unless we are dealing with highly correlated orthologs 26 . iii) Last but not least, we must search for distant ortholog genomes to look for RNA SLiMs conserved in most CoV genera. This criterion will be met when comparing beta-SARS-CoVs with genomes of the remaining CoV genera.
Identification of tCPs along CoVs genomes. Identification of tCPs along genomes was done as follows: i) we translated CoV genomes to tCP sequences with the help of Table 2; ii) we compared tCP sequences using the dynamic algorithm created for the global alignment of two sequences 27 ; iii) to compare the tCP profiles of different CoVs we computing the correlation coefficient between them; iv) if the correlation coefficient of a pair of given tCP profiles is higher than the cut-off considered, the corresponding shared tCPs were annotated.

Results
Alignment of full-length CoV genomes. We started comparing genomes of different CoV genera to verify whether they fulfil the requirements to apply the tCP-method (Table 3). The beta-SARS-CoV-2 genome share about 79% NT identity with others beta-SARS-CoV genomes, and <51% with others CoV genera, suggesting that beta-SARS-CoVs may have diverged from other CoV genera. When alpha, gamma, and delta-CoVs were compared it was observed <60% NT identity. The genomes of Civet SARS-CoV and human SARS-CoV-1, however, share almost 100% NT and tCP identities supporting the zoonotic nature of human SARS-CoV-1; they differ in a 29-NT sequence retained by the civet isolate not found in the human isolate 21 , sharing, therefore, all the 14 tCPs. In the other cases, the tCP identity is significantly lower than the NT identity.
To sum up, the results presented revealed that selected CoV genomes fulfil the requirements described in Methods and look like adequate for the application of the tCP method to comparisons with the SARS-CoV-2 genome.
Comparison of tCP profiles among human beta-SARS-CoVs-1 and 2. Beta-SARS-CoV-2 and beta-SARS-CoV-1 genomes were compared to illustrate what and how many tCPs are shared by both. The results showed ( Table 4) that six tCPs, <AG>, <CG>, <GT>, <AGT>, <C> and <T>, were shared by both, for r≥0,8 (p<0,001); high resemblances were observed for these profiles ( Figure 1). However, significant dissimilarities were observed among both genomes when the correlation coefficient decreases (Figure 1, profiles <AT> and <G>). The complete panel representing all the 14 tCP profiles from the comparison (Supporting Figure 1) shows, for r<0,7, notable dissimilarities in the remaining profiles Often, short highly similar fragments are observed interspersed in low similarity regions. Such short fragments would be of interest in local genome studies but they were not considered in this paper.
Upper number in the box: bold, NT percent identity; italics, tCP percent identity. Lower number in the box: percentage of GAPS   Comparison of tCP profiles among beta-SARS-CoV-2 and other CoV genera. NT and tCP percent identities falls up to 33% and 46%, respectively, when the beta-SARS-CoV-2 genome was compared with alpha-HCoV, alpha-PEDV, gamma-IBV and delta-common-moorhen genomes (Table 3); results indicate that <AGT> is the only tCP conserved, except for the alpha HCoV genome that conserves the tCP <G> (Table 4). But, what happen with the tCP <G> in HCoV? To answer the question, we further investigated the resemblance of both <AGT> and <G> profiles when the genome of SARS-CoV-2 was compared with other alpha-CoVs, as well as, with genomes from other CoV genera. Figure 2 shows profiles of both tCPs obtained from alignments of the SARS-CoV-2 genome with genomes of HCoV, PEDV, IBV and common moorhen-CoV genera. The figure illustrates the changes observed in both tCPs along genomes. <AGT> shows an increase in the correlation coefficient from r=0,63 (for HCoV) to r=0,92 (for IBV) in different CoV genera. All <AGT> profiles maintain a notable resemblance with the exception of the HCoV 5'region that shows important dissimilarities, explaining the decrease in the correlation coefficient observed in HCoV relative to other CoVs. However, a high identity was observed in the HCoV 3´region of the genome, suggesting that dissimilarity observed in the 5' region of the genome was species specific. <AGT> is shared also by SARS-CoVs suggesting that <AGT> would be evolutionary important in all CoV genera. Data further supporting the notion that even for the case of HCoV the resemblance of tCP profiles is notable despite the local dissimilarity in the 5' region.
Changes most pronounced were observed, however, for <G>. The correlation coefficient relative to SARS-CoV-2 for <G> profile changes notably (Figure 2), increasing progressively from avian (r=-0,24) to mammals (r=0,8). The profile resemblance observed for <AGT> is, however, lost for <G>, indicating a notable change in the <G> usage between avian and mammals. As it is shown in Table 3, the tCP <AGC> would also be conserved in the most CoV genomes analysed for a cut-off r≥0,7 (p<0,001). The high significance of the correlation coefficient ensure that at least 49% of the variance will be shared when CoV genomes are compared (determination coefficient, r 2 =0,49). Therefore, the resemblances of <AGC> profile would be also guaranteed providing useful evolutionary information. Figure 3 shows, a detailed analysis of <AGC> distribution when the beta-SARS-CoV-2 genome was compared with others beta-SARS-CoVs genomes ( Figure 3A) and also with genomes from other CoV genera ( Figure 3B). The <AGC> profile shows, in all CoV genera, a correlation coefficient r>0,7 (p<0,001) with the exceptions of both MHV (r=0,33) and common moorhen CoV (r=0,47), although, in bot genomes, resemblances in the 5' region were notable ( Figure 3B); All profiles maintain, indeed, a substantial resemblance, as expected, including MHV and common moorhen CoV, but in these cases with notable dissimilarities 3' regions of tCP profiles. There are not significant differences in correlation coefficients (r≈0,7) for <AGC> profile except for bat SARS-CoV (r=0,88), indicating evolutionary closeness between human SARS-CoV-2 and bat SARS-CoV. The tCP <AGC> would be, then, considered as significantly conserved in most CoV genera. Not only it is shared in all SARS-CoVs but also in the most CoV genera.
To sum up, the results presented revealed that two tCPs <AGT> and <ACG> were conserved in nearly all CoV genera. Despite dissimilarities in the HCoV 3' region of the genome and the high similarity in the 5' region for <AGT>, this is in keeping with the idea of a common origin for all CoV genera. We suggest, with a high confidence level, that RNA short motifs generated by tCPs <AGT> and <AGC> would be shared by most CoVs, independently on the NT similarity between their genomes.
List of short RNA motifs tCP conserved in CoV genomes. Having established, with high confidence level, that tCPs <AGT> and <AGC> were conserved by most CoV genera, and that <T>, <AG>, <GT>, <AGT> and <AGC> were conserved in beta-SARS-CoVs we are now interested in identifying short RNA motifs tCP conserved by most CoV genera.
First of all, we will create a target selection strategy to isolate short genomic motifs following the next criteria: i) RNA motifs must be tCP conserved in all beta-SARS-CoVs; the criterion imply that selected motifs will share NT-triplets from <T>, <AG>, <GT>, <AGT> and <AGC>. ii) Selected motifs will contain more than 50% of NT-triplets from the conserved tCPs <AGT> and <AGC>. iii) The RNA fragments chosen would be as long as possible; limited exceptions to this rule would be fragments of shorter length having very high percentages of NTs contained in <AGT> and <AGC>. Following these criteria, Table 5 shows 88 selected RNA motifs, as well as, their associated amino acids. The size, position and percentage of conserved NTs in tCPs <AGT> and <AGC> were also annotated. The SARS-CoV-2 harbours a wide-range of selected RNA motifs ranging in size from 12 to 49 bps (from 4 to 16 amino acids).
From now on, in the main text, we will referred to selected RNA motifs using the notation from the end of Table 5: upper case letters would represent NTs-tCP conserved in all beta-SARS-CoVs; underlined-bold upper case letters represent NTs-tCP conserved in most CoV genera; codons would be represented by separated NT triplets in the correct reading frame and amino acids tCP conserved in most CoV genera would be represented by bold upper case letters below their corresponding codons.      Distribution of NTs-tCP conserved in short selected RNA motifs along the SARS-CoV-2 genome. Differences in the frequency of appearance of NTs-tCP conserved in short selected RNA motifs were observed in Table 5. In order to gain a better sense of their distribution along the genome, Figure 4A shows a simple schematic of the genome organization of the SARS-CoV-2 28 , together with the distribution of NTs-tCP conserved at the same scale ( Figure 4B) in selected RNA motifs. When compared Figure 4A and 4B, it was observed the bulk of NTs-tCP conserved, located in the replicase-encoding orf 1ab, being the region with most NTs-tCP conserved and consequently the most populated with selected RNA motifs. The orf 1ab encode two overlapping polyproteins (abbreviated pp1a and pp1b) that regulate replication and transcription processes 17 being then an optimum target for a possible therapy. The figure shows a comparative between NTs-tCP conserved when compared the SARS-CoV-2 and other SARS-CoVs (black), and the SARS-CoV-2 and other CoV genera (grey); differences observed in the frequency of appearance between both cases are a consequence of differences in the number of tCPs conserved between the first and second cases ( Table 4). The larger size motifs appear in the pp1a region following by the pp1b region. Four genome stretches of at least 500 bps do not contain any selected tCP motif in the pp1ab-encoding orf. One of them is located at 5' region and other in the middle of pp1a-encoding orf; there are two other in the middle of the pp1b-encoding orf. This means that despite the high similarity shown by this genome region, lacks of motifs tCP conserved in SARS-CoVs and also in other CoV genera. The spike-pp encoding-orf also contains broad regions non tCP conserved specially in the pp1b/spike interface. Many other genome regions lacking conserved tCP motifs as the protein-encoding orfs of the surface, the envelope and also the matrix ( Figure 4B).  Table 5.
To sum up, the results presented revealed that encoding proteins of SARS-CoV-2 as pp1a, pp1b, the spike polyprotein (Spp) and the nucleocapsid protein, contain tCP conserved fragments separated by genome fragments lacking tCP conservation. A high number of tCP conserved motifs were observed in the pp1ab-encoding orf. In contrast, fragments and genome regions lacking tCP conserved motifs were observed specially near the 5' regions of the genome.

Discussion
We will discuss whether or not some selected RNA motifs from Table 5 would have biological relevance. As it is not possible the analyses of all motifs from the table, we centre the attention on motifs 1,9,24,39,68,79, 81 and 82, appearing in the literature inserted or attached to known relevant fragments of CoV genomes. It must be taken into account that conserved amino acids encoded by motifs of different CoV genera are there, attending possibly, to reasons of fitness or survival and could, therefore, be endowed with biological relevance.
The first point to be underlined, has to do with the fact that amino acids encoded by tCPconserved codons would be tCP-conserved too; as a result (see Table 2), asparagine (D)-encoding codons GAT and GAC, serine (S)-encoding codons AGT and AGC, valine (V)-encoding codon GTA, threonine (T)-encoding codon ACG, arginine (R)-encoding codon CGA, glutamine (Q)encoding codon CAG, and alanine (A)-encoding codon GCA would be also conserved, as well as the initiation (ATG) and termination codons (TGA and TAG). Motif 1 would be an example of a highly conserved region surrounding the initio of pp1ab.
There are two identical RNA motifs, (TGTAGCTAGT), encoding the motif V-A-S (valinealanine-serine). The motif appears in two regions of the SARS-CoV-2 genome, inserted in tCP conserved motifs 68 and 81(double-underlined), namely, (AT GTA GCT AGT TGT GAT GCA) and (CGG CGG GCA CGT AGT GTA GCT AGT), respectively; one is in the pp1abencoding orf and the other in the Spp-encoding orf. The V-A-S motif present in SARS-CoV-2 Spp, is absent, however, in other beta-SARS-CoV Spp's, namely, human SARS-CoV-1 and civet and bat SARS-CoVs; this is a notable difference between SARS-CoV-1 and SARS-CoV-2 genomes, suggesting that the motif found in the Spp would be species specific. The motif appears profusely in beta-SARS-CoVs pp1ab and also in alpha-HCoV pp1ab. Its presence is, however, very limited in other regions of CoV genomes (Supporting Table 1).
The motif V-A-S is attached to the R-R-A-R fragment of motif 81 (R-R-A-R-S-V-A-S) through a highly conserved serine, a cleavage site for the host protease furin necessary for activation of Spp of SARS-CoV-1 and MERS-CoV as proposed by a two-step sequential protease cleavage model 29,30 ; this support the importance of the motif in SARS-CoVs Spp because pp1a and pp1b help to release functional polypeptides by the papain-like protease (PL pro ) and the 3C-like protease (3CL pro ) situated in the non-structural protein (NSp) region, that could serve as substrate for some type of inhibitors. The short motif V-A-S appears often associated to SARS-CoV-1 3CL pro cleavage sites, into canonical recognition positions; more specifically, at the same positions of NSp4/5 and NSp8/9 31 . Moreover, sequences containing V-A-S could be considered as potential substrates for some protease inhibitors in SARS-CoVs. The sequence of SARS-CoV-1 3CL pro L-V-A-S-T, containing the V-A-S-motif, serve as substrate for classic protease inhibitors as 5-(2′aminoethyl) aminonaphthalene sulfonic acid (EDANS) 32,33 , amino benzoyl (abz) 34,35 , or the colorimetric label pNA 34 . It is the first time, as far as we know, that this sort motif has been suggested as a key fragment in certain experimentally deduced motifs having biological significance. This is supported by data describing that amino acid sequence alignment of SARS-CoV-2/SARS-CoV-1 replicase polyproteins show a high overall identity, resulting in a noticeable conservation at the polyprotein cleavage sites 36 . The motif appears up to 10 times in the SARS-CoV-2 genome, 9 times in the pp1ab-encoding orf and one more time in the Spp-encoding orf, but only two of them are tCP conserved in all CoV genera, one in the pp1ab-encoding orf and the other in the Spp-encoding orf at positions 18841-18860 and 23605-23628, respectively (Table 5). A detailed analysis referred to the V-A-S-motif shows why information supplied by the tCP codelike is complementary to that obtained from the genetic code. As can be observed, the fact that the V-A-S-motifs are identical in SARS-CoVs does not imply necessarily they must be highly tCP-conserved in all CoV genera. In fact, there are notable differences in their tCP conservation, indicating, then, the evolutionary relevance of each one. The V-A-S-motif can be considered an example of how viruses incorporate simple motifs into their protein sequences to mimic human proteins and enhance their functional capabilities in host cells during infections. In addition, we suspect the V-A-S-motif could have a biological role in CoV genomes due to the high number of copies of the motif present in the pp1ab relative to the rest of viral proteins where few if any exist. Despite the size of the motif its presence in the pp1ab is much higher than expected relative to any other genome region (Supporting Table 1).
Other motif with possible biological relevance in beta-SARS-CoVs Spp could be the motif-82 encoding G-K-Y-E-Q, (TT GGA AAG TAT GAG CAG TA). It is located at the end of the Spp and it is attached to the Spp fragment Y-I-K-W-P-W-Y-I-W-L, a highly conserved tryptophan Wrich membrane proximal external region (MPER) present on members of Coronaviridae family and more specifically in beta-SARS-CoV-2 [37][38][39][40][41][42] , forming the polypeptide G-K-Y-E-Q-Y-I-K-W-P-W-Y-I-W-L. Mutational studies have shown that W residues in the MPER are essential for effective viral infection and that MPER could serve as inhibitor for the entry process 43 ; the functional reason for the presence of the tCP conserved motif G-K-Y-E-Q attached to MPER would be related to the highly hydrophobic nature of the MPER peptide requiring the inclusion of the upstream charged residues K-Y-E-Q to increase its aqueous solubility 44 . Table 5 (Table 5) has also interest. Segments of the motif have identity with functional fragments of the high mobility group box 1 (HMGB1) protein, actively secreted by inflammatory cells in response to pathogenassociated molecular patterns 48,49 and probably related with SARS-CoVs pathology 39 . An anti-HMGB1 monoclonal antibody has been developed 50 recognizing specifically the C-terminal tail of the motif 9, D-E-D-E-E-E. As can be observed, the motif contains a long acidic tail comprising, mostly glutamic (E) and aspartic (D) acids, similar to that described in HMGB1 51 . A current hypothesis is that immunization with anti-HMGB1 antibodies would confer protection against SARS-CoV-2 injuries 52 by inhibiting the protease. Proteolytic processing of the CoV replicase is essential for ongoing viral RNA synthesis 53 . The enzymatic activity of both PL pro and 3CL pro are essential for the viral life cycle. That is why, the beta-SARS-CoVs proteases are attractive targets for the development of antiviral drugs to reduce viral replication and pathogenicity 54 .
The Spp is utilized by CoVs to bind to their cellular receptors. The crystal structure of the beta-SARS-CoV-2 S receptor-binding domain (RBD) interacting with the ACE2 receptor has been determined 55 . The binding activates the fusion between the cell and SARS-CoV membranes for the virus entry into the cell 55,56 . Into the SARS-CoV-2 RBD there is a receptor binding motif (RBM) containing most of the contacting residues for ACE2 binding 55 . Inserted in the RBM, is the RNA motif 80, TG TAT AGA TTG TTT AGG AAG T-encoding the peptide Y-R-L-F-R-K (Table 5), highly conserved in beta-CoVs although less conserved in other CoV genera. We think the motif conserve the residues Y and R because they could be determinant for the effectiveness of the Spp/ACE2 interaction. We based the suggestion on the high percent identity of tCPconserved NTs from other CoV genera (67%). Only five motifs, following tCP selection criteria, have been selected in the Spp-encoding orf, and only one is located in the RBM. In the Spp RBM, the percent tCP-conserved residues between SARS-CoVs and other CoV genera is discrete, indicating that the protein motif would be specific of beta-SARS-CoVs; therefore, the relevance of the motif 80 is dictated by its location in the RBM and its implication in the Spp/ACE2 interaction. The Spp also contains the peptides R-R-A-R-S-V-A-S-and R-G-D-E-V-R encoded in RNA-motifs 81 and 79, namely, (CGG CGG GCA CGT AGT GTA GCT AGT) and (T AGA GGT GAT GAA GTC AGA C). Motif 79 is located in the RBD out of the RBM, and motif 81 is located out of the RBD, having 85% and 75% percent identity of tCP-conserved NTs, and also 67% and 75% tCP-conserved residues in other CoV genera different from beta-SARS-CoVs. Therefore, the V-A-S-fragment inserted in motif 81 appears again, now located in the SARS-CoV-2 Spp being relevant for two main reasons: first, it is known that effective conformational changes of S protein leading to membrane fusion, require both, the receptor binding and a proper protease activation. A R-R-A-R-furin site was found 55 in the S protein motif R-R-A-R-S-V-A-S between S1 and S2 subunits in SARS-CoV-2 S protein inserted in RNA motif 81 containing also the highly conserved V-A-S-fragment; second, this RNA-fragment found in the beta-SARS-CoV-2 S protein is absent, however, in human, civet and bat SARS-CoV S proteins, highlighting, thus, a fundamental difference between those CoVs in a CoV region specially sensible for the survival. It is known that the SARS-CoV Spp can be activated by host cell proteases, with proteolytic cleavage at the S1/S2 boundary and adjacent to a fusion peptide in the S2 domain 57 . Thus, although elastase-mediated activation of SARS-CoV Spp has been considered an important factor for the severe pneumonia seen in SARS-CoV-infected patients, studies with neutrophil elastase (NE) failed to give any consistent activation of SARS-CoV S-mediated fusion 57 . The V-A-Smotif appears also in the Spp and it is known that neutrophil elastase, prefers substrates containing the V-A-S-motifs at the P1 position. This fact endowing the V-A-S-motif even more evolutionaryrelevance. Other factors such as the unique R-R-A-R-furin cleavage site at the S1/S2 boundary of the SARS-CoV-2 Spp could play a role in facilitating the rapid human-to-human transmission 55 .
The MPER has an important role in the function of the S protein. The MPER, rich in aromatic residues with three or four W and two or three Y residues are present in all coronavirus S proteins 43,58 . Such sequence conservation is also found in other viruses with class I fusion protein, such as human immunodeficiency virus type 1 (HIV1), feline immunodeficiency virus, influenza virus, and Ebola virus [59][60][61][62] . The selected RNA-motif 82 (Table 5) attached to MPER acquire biological relevance principally for the following reasons: i) When working with synthetic peptides derived from CoV S proteins, the highly hydrophobic nature of the MPER peptide required the inclusion of upstream charged residues K-Y-E-Q in the peptide to increase its aqueous solubility and facilitating purification 44 ; ii) In SARS-CoV-2 Spp the upstream charged residues K-Y-E-Q-motif 82 are naturally attached to MPER (this paper); iii) K-Y-E-Q-Y-I-K shares sequential homology with the cholesterol recognition amino acid consensus motif 63 , that was believed to result in a peptide or protein that preferentially associates with cholesterol, a principal component of lipid rafts 44 .
The matrix protein M is an integral membrane protein involved in budding which interacts with the nucleocapsid and S proteins 58,59 . As a component of the viral envelope plays a role in morphogenesis and assembly by interaction with other viral proteins. M protein is connected with infectivity through binding to S protein and the surface receptor, to promoting membrane fusion 60 . In this context, the absence of the tCP-conserved V-A-S-motif in the beta-SARS-CoV-2 M protein and its presence in the beta-SARS-CoV M protein is, between others, one of the most notable differences between both human beta-SARS-CoVs. Supporting Table 1 illustrates that, since the point of view of the M protein, the V-A-S-motif, as it occurs with human SARS-CoV, is also present in civet and bat beta SARS-CoVs and also in human and porcine alpha-CoVs and will be absent in the rest of CoVs. The presence of the V-A-S-motif in CoV Spp imply its absence in the matrix protein and vice versa, with one exception, the porcine alpha PEDV that having the motif in both. Moreover, the motif is present in SARS-CoV-2, murine MHV, porcine PEDV and avian CoV M proteins but not in SARS-CoV, civet and bat M proteins. The above data suggest more evolutionary proximity of civets and bats with human SARS-CoV with human SARS-CoV-2 that could be more evolutionary close to murine MHV. The presence of the V-A-S-motif in beta-SARS-CoV M protein could play a role in M protein because neutrophil elastase, having activity against a broad range of extracellular matrix proteins, prefers substrates with Val>Ala>Ser, Cys at the P1 position 61 .
The E (envelope) protein is the smallest of the structural proteins and it is abundantly expressed in the infected cell although, only a small portion is assimilated into the virion envelope 62 . Most M proteins are restricted at the site of subcellular trafficking where it participates in CoV assembly and budding 63 . In SARS-CoV-2 the tCP-method does not selected any RNA motif for this small protein as occurred for the M protein. The only specie having a V-A-S-motif in the E protein was the porcine alpha PEDV (Supporting Table 1).
The SARS-CoV-2 N (nucleocapsid) protein is a RNA-binding protein playing vital roles in i) forming helical ribonucleoproteins, ii) regulating viral RNA synthesis in replication/transcription, and iii) modulating infected cell metabolism [64][65][66] , being a highly immunogenic and expressed protein during infection, inducing immune responses against human SARS-CoV and SARS-CoV-2 67,68 . The most CoVs studied, with the exception of common moorhen delta-CoV (Supporting Table 1), lack the V-A-S-motif in the N protein, however, the tCP-method selected six RNA motifs conserved in beta-SARS-CoVs with percent of RNA conservation in other CoV genera ranging in between 52% and 72% (Table 4).
To sum up, the results presented revealed a new set of short RNA motifs in the SARS-CoV-2 by reading the genome fully-overlapping and categorizing RNA triplets by gross composition ( Table 2). Some motifs show functional characteristics experimentally deduced and the remaining motifs must be taken into account in studies of functionality although today be unknown. For example, to our knowledge, the V-A-S-peptide, codified in motifs 68 and 81, has not been described before and however it is present in many protein fragments functionally relevant in CoVs. The same occur with other peptides as those codified in motifs 9, 68, 79, 80, 81, 82 (Table  5). In some cases, as the selected motif 80, it is inserted in a region functionally relevant as the RBM of the beta-SARS-CoV-2 S RBD to bind to the ACE2 receptor; in other cases, as the motif 82, it is attached to fragments functionally relevant, as the MPER, to increase its aqueous solubility and to facilitate its purification. It is unlikely to emerge those new motifs at random, inserted or flanking functional CoV genome fragments in general and in SARS-CoV-2 in particular. We suspect that selected RNA motifs could have significant roles in the fitness and survival of CoVs due their high degree of NTs-tCP conserved inter genera. They could be suitable for studies of inhibition and vaccines, more valuable because these motifs have not been observed heretofore.