The use of artificial neural replicators for the analysis of nucleotide sequences of viroid RNA was presented in [1]. The main idea of the approach was an attempt to differentiate classes of viroids according to their "pathogens". These virtual pathogens can replicate using the information contained in the viral sequences. As candidates for the role of such artificial pathogens, artificial neural replicators were used in [1]. In this work, a specific representations of nucleotide sequences were introduced using two incomplete binary codes and it was found that the self-reproduction of neuronal replicators differs in the corresponding two cases. It also turned out that in many cases the patterns transmitted by neural replicators - fuzzy motifs - can have interesting symmetry and contain useful information for further analysis.
Here we present the results of the analysis of the genome sequences of both RNA and DNA viruses, with a focus on double-stranded human papillomaviruses. We start with a brief overview of neural replicator analysis (NRA), then briefly touch on the features of replicator tables (RT) for viral genome sequences of different sizes, and devote the main part of the article to the analysis of more than 200 types of human papillomavirus. In conclusion, a discussion of the results obtained is presented.
Neural Replicators Analysis
The basic artificial neural network model used in Neural Replicators Analysis (NRA) is the self-reproducible neural network (neural replicator) [1, 3, 4]. This model includes the mechanism of synchronously changing threshold of all neurons having binary states xi (+ 1 or − 1) in the standard Hopfield network [2]. The remarkable phenomena observed in such a system [1, 3, 4] is that in a chain of networks after few steps of transmission a special network arises in a chain which transmit further just those patterns which it learned from its neighbor. So, this network produces its exact copy, or is self-reproducible. The self-reproducible networks are absolutely transparent ones − they show as quasi attractors all learned patterns during the cycle of threshold growth. The model [1] suggests that neurons take binary values. Though many generalizations of this model permit to avoid this restriction just such code scheme was used for genomic analysis in a previous paper [1]. In this paper non-traditional representation of nucleotide sequences was used. Instead of four-letter genetic code two binary code schemes to represent these sequences were introduced. The first code (called WS code) combines the Watson-Crick pairs (AT) and (CG) and presents them as a weak (AT) pair encoded by “–1” and a strong (CG) pair encoded by “+1”. The second keto-amino (KM) code combines a wobble pair (TG) encoded with “+1” and a less stable (AC) pair encoded with “–1”. These two incomplete codes were used to construct sets of networks of different sizes K starting from 3 with the Hebbian interconnections calculated with the use of patterns generated by sliding the nucleotide sequence consisting of N nucleotides with a window having a length K. N resulting patterns (note, that their number does not depend on K) are used to form the Hebbian matrices of interconnections of the two parent fully connected Hopfield networks (for WS and KM coded patterns, correspondingly). For a sliding window of the same size the source parent network can generate a nontrivial replicator with a non-empty set of the patterns for transmission, or non-replicating network with empty set of patterns for transmission. This last network cannot generate descendants or, in other words, cannot breed. The last situation is rather common for some virus types, but, in general, RTs of viruses have non-trivial form. In [1] it was demonstrated that despite a wide range of RT forms some reasonable approximate categorization of two viroid families can be derived. Other interesting phenomenon is connected to the replicator transmitted patterns - fuzzy motifs. It was shown [1] that patterns transmitted by replicators contain additional information and often have interesting symmetries and periodicities [1]. In this paper it is demonstrated that these symmetries can be useful for differentiation of human papillomaviruses genera. Really in this case the sets of WS-coded transmitted patterns simplify to single pattern for the network of maximal size (the replicators of large sizes do not exist) and the analysis of this set becomes easily interpreted. More details about the different additional results of the application of NRA to the study of rather short viroid genomes are presented in [1].
Application To Virus Genomes
Obviously, the approach proposed in [1] and applied to the analysis of viroids can also be applied to virus genomes. Hepatitis delta virus (HDV) has the smallest DNA genome, closely resembling the RNA genome of viroids [5], and its replicator table has a form typical of viroids, as well as a small genome, such as that of narnaviruses or mitoviruses (see Fig. 1).
The replicator tables of other hepatitis viruses have different shapes, but, as we will further see, the RTs of hepatitis A, C, and E viruses have fairly common forms for viruses with modest and large genomes. The main feature of these viruses is that they do not have replicators generated on KM, encoded DNA or RNA sequences (Fig. 2).
Let us pay attention to the RT of the hepatitis E virus (Fig. 2E). It also lacks replicators when its RNA sequence is represented by the WS code. For hepatitis A and C, these replicators exist, but only up to a certain maximum size of the neural network: five for hepatitis A and eight for hepatitis C. Also for hepatitis A, there are all replicators of smaller network dimensions (starting from 3). We will call such RTs monotonic. In contrast, for the hepatitis C virus, there is no replicator for network size 6. We will call corresponding RTs nonmonotonic and will further use asterisks to mark corresponding virus data.
Further we can forget about right columns of RT and use only maximal size of replicators corresponding to WS code for the analysis. It can be seemed that this information is rather poor, in particular, if we take into account that RT for the virus SARS 2 isolate 2019-nCoV (WHU01 29881 bp ssRNA(+)) is the same as the RT for hepatitis A. But in such a case we can use additional information related to the patterns which use replicator networks for information transmission. For example, for hepatitis A replicators of the size 5 are built on only one pattern: 11–111, while for SARS virus on two patterns: 11–111 and − 11 − 1 -11. As we will see the forms of RT and forms of patterns can give us interesting information about virus genomes similarities and also about their divergence.
Analysis Of Human Papillomaviruses
Here, we apply the approach described above to the analysis of human papillomavirus (HPV) genomic sequences. HPVs are small, non-enveloped, double-stranded DNA viruses belonging to the Papillomaviridae family. This family was designated as a separate family, Papillomaviridae, in the 7th ICTV report [6]. The taxonomy of these viruses is usually based on the study of the nucleotide sequences of the main viral capsid protein L1 [7]. HPV types belonging to different genera share less than 60% similarity within the main capsid protein L1 sequence of the genome. Different types of viruses within the genus have 60 to 70% similarity. The new HPV type has less than 90% similarity to any other HPV type. The papillomavirus nomenclature at the species level and above is determined by the papillomavirus study group of the International Committee on Taxonomy of Viruses [8]. Human papillomaviruses are classified into 5 genera - Alpha, Beta, Gamma, Mu and Nu, containing many species and types: the number of these types increases linearly with time for the beta genus and extremely rapidly for the gamma genus - the rate of detection of HPV types increases, mainly as the result of metagenomic sequencing [9]. Here we use species and types taxonomy data provided in [10] and the relevant NCBI and GenBank references are provided in the Materials section.
Instead of RTs, which in this case do not have replicators for CM-encoded sequences for a genome size of about 8000, we will use a convenient visualization of the situation, showing only replicators with WS-code. Next, we will use colors to mark the maximum size of the Nmax replicator neural network generated using the WS-encoded genome sequence. Thus, the situation of the absence of replicators will be marked in black, the presence of only a replicator of size 3 in purple, the presence of a maximum size 4 in blue, the maximum size 5 in light blue, the maximum size 6 in green, the maximum size 7 in yellow, the maximum size 8 in orange and the maximum size 9 in red (see Fig. 3). It turns out that this set of colors is sufficient to characterize all replicators of maximum size for all types of human papillomaviruses studied. We will also use one or two asterisks to denote cases with non-monotonic sets of replicators (up to one maximum size), when a replicator does not exist for one or two smaller sizes, respectively. We start with the genus Alpha papillomavirus and present the results of their study on Fig. 3.
Alpha papillomaviruses
Standard characteristics of this genus is that “Alpha HPVs preferentially infect the anogenital and oral mucosa, causing both malignant and benign neoplasms. Cutaneous lesions have also been observed”. [11]. One of the oncological disease connected with Alpha genus is the cervical cancer (note, however, that HPV belonging to Beta and Gamma genera are also considered as carcinogenic cofactors of cervical cancer [12]).
There are some interesting observations that can be seen in this picture. Firstly, the Alpha-2 and Alpha-4 species, which have a large size of replicators (up to 9), differ significantly from other species of the Alpha papillomavirus genus. It is noteworthy that, unlike the types of other species, it is the types of these two species that cause the formation of skin warts (as we found in [7] “The genus alpha also includes a few cutaneous HPV types (HPV2, 3, 7, 10, 27, 28, and 57), which cause common and plantar warts " In addition, most types of high oncogenic risk (HR) are characterized by the absence of replicators (black boxes). Thus, using NRA, we can recognize a clear division of the genus Alpha into two subgenera, which was not obtained by a method based on the study of the similarity of the nucleotide sequences of the main capsid protein L1 [7].
More information can be obtained by considering patterns which are transmitted by replicators of maximal size. In all cases they transmit single patterns which for all types of species Alpha-2, Alpha-4 and also Alpha-3 are periodic with period equals to 2 (see Fig. 4). With only one very interesting exception (which will be further discussed) such periodic transmitted patterns are typical only to the Alpha papillomavirus genus. However, for species Alpha-14 non-periodic patterns are transmitted by replicator of the size 5. In order to clarify the situation with Alpha-14 let us consider the Beta genus of human papillomavirus.
Beta papillomaviruses
Beta HPVs cause only skin lesions and exist in a latent form in the general population, but are activated under conditions of immunosuppression [11]. Beta HPV types under the influence of certain cofactors can also trigger a malignant process. Recent studies point to the role of human papillomavirus Beta types and HPV-associated inflammation in the development of squamous cell skin cancer (the second most common non-melanoma skin cancer after basal cell carcinoma). But Beta HPV infection appears to play an important role in initiating carcinogenesis, but not in tumor progression [13]. NRA shows that the "coloristic" of Beta papillomaviruses (Fig. 5) differs from what we observe for Alpha papillomaviruses (Fig. 3). It is characterized by a predominance of replicators with a maximum size of 5 (blue boxes), a lack of larger replicators (such as those of Alpha-2 and Alpha-4), and a small number of types without replicators at all (Fig. 5). Even more remarkable, all 5-neuron replicators transmit a single pattern that is identical to the non-periodic pattern of HPV90 and HPV106 types Alpha-14 (Fig. 3). So, in terms of neural replicator analysis, should species Alpha-14 be moved to the genus Beta or other genera? We can clarify this by looking at the genera Gamma and Mu (we realize that this analysis is rather rough and does not claim to draw any solid conclusions). We also note that Beta-1 is the only genus of human papillomaviruses containing types HPV8, HPV47, HPV99, which have transmitted patterns of length 4, and type HPV8 has a complex set of such patterns, including not one, but four members. This type, along with HPV5, HPV20 (Beta-1), HPV17 (Beta-2) and especially HPV28 (Alpha-2), also unique among all human papillomaviruses. It has a pattern length of 6 and is associated with a high risk of developing squamous cell skin cancer [13].
Gamma papillomaviruses
The Gamma papillomavirus genus is highly diverse, but most healthy adults chronically shed Gamma virions from apparently healthy skin surfaces. Recent metagenomic studies have nearly doubled the number of known Gamma HPV types [14]. While the Beta papillomavirus genus is related to epidermodysplasia verruciforma, patients with the WHIM syndrome (warts, hypogammaglobulinemia, infections, myelokathexis) have been found to be uniquely susceptible to Gamma HPV- associated skin warts.
Neural replicator analysis of Gamma papillomaviruses shows that they share some properties with Beta papillomaviruses, but also differ from them. Like Beta papillomaviruses, their types can form replicators with a maximum length of up to 5. More importantly, the only kind of non-periodic transmission pattern is the same as that of Beta papillomaviruses. The number of species of Gamma papillomaviruses is large and, as can be seen from Fig. 6, the proportion of Gamma papillomaviruses that do not generate neural replicators (black boxes) exceeds 60%, while for Beta papillomaviruses this figure is about 11%. Thus, we can assume that Alpha-14 papillomaviruses species are more similar to Beta, and not to Gamma human papillomaviruses.
Mu papillomaviruses
Mu papillomaviruses are among the HPV types associated with cutaneous disease. The HPV1 type is responsible for about 30% cases of common warts [15]. The results obtained for Mu human papillomaviruses show, that they are similar to those for Beta and Gamma papillomaviruses (see Fig. 7 left): the type HPV63 has the same non-periodic transmitted pattern as for the Beta and Gamma genera.
Nu papillomaviruses
The most interesting result of NRA was obtained for Nu (HPV41) human papillomavirus. Initially, this virus was isolated from a facial wart, but subsequently its DNA was found in some skin carcinomas and precancerous keratoses [16]. The genomic sequence of this virus is most distantly related to all other types of human papillomaviruses, and HPV41 virus has been identified as the first type of new Nu genus. But NRA analysis shows that it is ideal for Alpha-2 specie because it has a maximal replicator size of 7 as well as the same periodic transmitted pattern (Fig. 7 right). The clinical manifestations of Nu virus infection are similar to those of the Alpha-2 species (although it also causes malignant skin lesions), so this result is not inconsistent with the characteristics of this genus. What is also interesting is that NRA may provide some additional information about the problem of virus transfer to another host, as well as the taxonomy of viruses. As the Van Doorslayer paper says [17]:
“Because of the absence of cross-species infections, it is unlikely that horizontal gene transfer played any role in the evolution of the Papillomaviridae. In fact, a study specifically looking at the influence of horizontal gene transfer identified only a single potential cross-species transmission event. This event involved ancestors of a porcupine (EdPV1) and human (HPV41) papillomavirus [18]. These two viruses are the only members of a divergent genus (Nu papillomavirus); it will be of interest to see how the inclusion of more viruses in this genus will affect the conclusion of cross-species infection [17]”.
In this situation, it was very interesting to use NRA to study the porcupine EdPV1 virus. It turned out that indeed it has a replicator of maximum size 6 (7 for HPV41), but the pattern transmitted by this replicator (3-periodic) differs significantly not only from the 2-period pattern of HPV41, but also from any pattern transmitted by the replicators of all papillomaviruses (see Fig. 7 right bottom). Thus, from the point of view of the NRA, the porcupine sigma virus EdPV1 cannot be combined with the HPV41 virus into one genus, nor can it be attached to any other genera of human papillomaviruses.