Background: the origin of the current SARS-CoV-2 virus is still elusive. However, the finding that Laos bats host closest known relatives of virus behind Covid that even without the furin site they can infect human cells, it is a giant stride forward to known the progenitor of the virus. The mystery focus on the CGG-CGG code of the SARS-CoV-2 furin site arginine dimer. This genetic footprint has not been observed in natural coronaviruses.
Results: based in a bioinformatic approach I found that the two possible 12-nucleotide fragment (and they reverse complement) properly inserted within the codon S680 in the S gene of the closest relative to the virus, encoding the PRRA furin site motif, 100% match to severe human NCBI RefSeq curated mRNA protein-coding transcripts. The genes to which these transcripts belong are ubiquitously and highly expressed genes, e.g., the alpha subunit of the mitochondrial ATP synthase F1 (ATP5F1) and/or the ubiquitin specific peptidase 21 (USP21); or specific and highly expressed genes in tissues such as small intestine, duodenum, brain, kidney and gonads which are the gateway of the SARS-CoV-2 infection. On the other hand, a significant group of virus specimens have been discovered in which the arginine dimer of the furin site is codon-optimized by the SARS-CoV-2 codon usage bias. Another main finding was to discover other PRRA-like insertions in the S protein from some SARS-CoV-2 specimens.
Conclusions: the first conclusion of this work is that the SARS-CoV-2 is deleting the non-viral origin of the furin site CGG-CGG arginine pair through codon optimization. Based on the results shown here together with the new Laos coronaviruses the main conclusion of this work is that a genetic recombination of unrelated RNA sequences between the a closest relative to SARS-CoV-2 and human transcripts, during an undetected viral circulation in humans, could be the origin of the polybasic furin cleavage site.