Homologous classification and genomic sequence flow of LASV in West Africa.
Evolutionary relationship of Lassa mamarenavirus and other arenaviruses has revealed the distribution of different viruses across the globe. Basically, the phylogenetic analysis helped to classified the viruses into 3 distinct monophyletic groups, i.e. the old world arenaviruses, which were Lassa mammarenavirus, Mopeia virus AN20410, Morogoro mammarenavirus, Luna mammarenavirus, Mobala mammarenavirus, Ippy mammarenavirus, Merino Walk mammarenavirus, Lymphocytic choriomeningitis mammarenavirus and Lujo mammarenavirus, mainly from African countries, while the new world is Allpahuayo mammarenavirus, Flexal mammarenavirus. Cali mammarenavirus, Pirital mammarenavirus, Brazilian mammarenavirus, Paraguayan mammarenavirus, Cupixi mammarenavirus, Whitewater Arroyo mammarenavirus, Machupo mammarenavirus, Bear Canyon mammarenavirus, Argentinian mammarenavirus, Chapare mammarenavirus, Tacaribe mammarenavirus, Oliveros mammarenavirus, Guanarito mammarenavirus, Latino mammarenavirus, and Tamiami mammarenavirus mostly found in North and south America [28]. The other group was Reptarenaviruses, viruses have a host origin of reptiles, e.g., Snake, and they are, University of Giessen virus, Golden Gate virus, ROUT virus, University of Helsinki virus, and CAS virus as shown in Figure 1.
GenBank and virus pathogen resource database (ViRP) recorded a total of 1,903 genomic sequences for LASV and 1,796 sample sequences from Africa. 347 are multimammate mouse (Mastomys natalensis) origin, 1,374 are from the human host, and 75 sequences are an unknown source. The distribution of the sequences across endemic West African countries is shown in Figure 2.
Transcription Starting Site (Promoter) and TATA Box in LASV Genome
Table 1, shows promoter sites of two Lassa mammarenavirus segments, S and L variants of the genome. The homologous gene MK291249.1 and MH157036.1 have the highest alignment scores with different strains isolated at different locations in Nigeria. The S segments started from position 2917 to 2947and L segments from 1863 to 1894 as predicted by promHG promoter prediction tool.
Table 1: Promoter site and TATA Box in LASV Genome.
S/N
|
Segment
|
Homologous Gene
|
Predicted Promoter Site + TATA Box
|
Position (bp)
|
1.
|
S_Segment
|
MK291249.1
|
ATATAAACACCTGAGCTTAGTGGCCTTTCTG
|
2917 - 2947
|
2.
|
L_Segment
|
MH157036.1
|
ATATAAACGTCTCAAAGAATGAATGATGTGGC
|
1863 - 1894
|
Effective LASV siRNA
Table 2 shows the list of 10 Lassa mammarenavirus siRNA designed by BLOCK-iT RNAi Designer based on a statistical analysis of valid siRNA and branded algorithm. The regions of each designed Oligos represented as an open reading frame (ORF) and 5′ untranslated region (5′ UTR), the leader RNA, followed by the GC content of each siRNA designed. BLAST results of the highly species-specific gene for the designed RNAi DNA sequences shown in the last left column.
Table 2: LASV siRNA and targeted genes in humans.
S/N
|
S_Segment (MK291249.1)
|
L_Segment (MH157036.1)
|
Targeted Genes by BLAST Alignment
|
Sequence(DNA)
|
Region
|
GC%
|
Sequence(DNA)
|
Region
|
GC%
|
1.
|
GCTACAAACTCTAGAGCTA
|
5’UTR
|
42.11
|
CCATTGAACTCTTTGTCTT
|
ORF
|
36.85
|
NM_004446 CD742789
CR933660 NM_004446
NM_030625 BX649078
CA392182 NM_030625
AK129490 NM_001402
NR_002728 AI873453
NM_138459 NM_001001890
AK021513 NM_004446
XR_109175 NR_027024
NM_030625 AK126737
AK125883 NM_004446
|
2.
|
GCTAACCACTGTGGGACTA
|
5’UTR
|
52.64
|
CCACAAACCCAGATGCTAT
|
ORF
|
47.37
|
3.
|
GCAAGCAGACAACATGATA
|
5’UTR
|
42.11
|
GCTAAGTGCTTCAGAATTA
|
ORF
|
36.85
|
4.
|
GCATATGGCATAGATCTTT
|
ORF
|
36.85
|
GCACAACATTCCTTACTTA
|
ORF
|
36.85
|
5.
|
CCATGAGAATATTTGGCAT
|
ORF
|
36.85
|
GCATAACACTTTGAGCATT
|
ORF
|
36.85
|
6.
|
GCATACAAGCTCCAGCTTT
|
ORF
|
47.37
|
GCACCTTACAACCTGGTAT
|
ORF
|
47.37
|
7.
|
CCTAACAACTCCGTCTCTT
|
ORF
|
47.37
|
GCAAGGAACCTATCACCAT
|
ORF
|
47.37
|
8.
|
GCTGCTGTGTACTCAAATT
|
ORF
|
42.11
|
GCTTGTCAGTTAGAACATT
|
ORF
|
36.85
|
9.
|
GCAGGTCATCTGAGGTCAA
|
ORF
|
52.64
|
CCAACAGACTCCAAATCAT
|
ORF
|
42.11
|
10.
|
GCATTAAACGCTGCACATT
|
ORF
|
42.11
|
GCTAACTTCTGTCTTGATA
|
ORF
|
36.85
|
Phylogenies and discrete phylogeography
Based on comprehensive genomic sequence evaluation, it recorded from GenBank, a total estimation of LASV sequences from affected 16 states in Nigeria between the years 2008 to 2018 to be 735. Estimated sequences were grouped into two regions of Nigeria (North and South); the Northern states were Bauchi=33, Benue=3, Kaduna=3, Kogi=22, Nassarawa=28, Plateau=17, and Taraba=11, while southern states include Anambra=9, Delta=18, Ebonyi=107, Edo=241, Ekiti=4, Enugu=43, Imo=10, Ondo=182, and Rivers=4, as shown in Figure 3.
Figure 4 shows the circular phylogenetic tree of the LASV S-segment genome inferred by Maximum-Likelihood phylogeny with General Time Reversible (GTR) substitution model and Gamma Distributed Invariant (G+I) rates among sites in MEGA-X software and visualized with iTOL online tool. LASV lineage I to VI indicated with different color ranges, and newly 75 sequenced segments represented in red color.
From the MCC phylogenetic tree in Figure 5, it can infer the domestic relationship of LASV in different states represented by the tree branches and nodes. The strains from Bauchi, Benue, Plateau, Nasarawa, and Kaduna from the same clades show a distinct evolutionary relationship, comparing the nodes. Most of the sampled strains in the southern part of the country were shown to evolve from two major monophyletic groups; Ebonyi, Edo, Delta, and Anambra, which form the first clade and second clade, Ondo Kogi and Ekiti from the second one. The strains from Plateau and Rivers were seen to distribute across different clades genetically.
MCC phylogenies of LASV (LASVsSgp1) among the affected states in Nigeria. Descendant nodes and branches colored according to the most probable location. A year before recent analysis showed by the scale bar at the bottom of the tree.
With the above MCC tree, the spread of LASV in Nigeria can be understood using the annotated tree to determine the spatiotemporal distribution on Google Earth using a KML file generated with SPREAD, as shown in Figure 6.