Complete LTRs were obtained from all eight cat samples. Their length and GenBank Accession Number are provided in Table 2.
Table 2
Length expressed in nucleotides and GenBank Accession Number of the LTR sequences of the FeLV-negative cats obtained in this study.
Code
|
Length (nt)
|
GenBank Accession Number
|
C-3
|
542
|
OP595717
|
C-12
|
541
|
OP595706
|
C-16
|
542
|
OP595709
|
C-21
|
542
|
OP595711
|
C-30
|
541
|
OP595707
|
C-31
|
541
|
OP595713
|
C-33
|
542
|
OP595715
|
C-34
|
538
|
OP595708
|
Table 2
Transcription binding sites (TBS) in several exogenous or endogenous retrovirus. TBS are arranged as factors related to the different aspects discussed below in the text (shown in bold). In each cell the digit represents the number of TBS or its range detected in the majority of genomes. Digits in parenthesis indicate the number of TBS in genomes exceptions to the majority.
|
Gammaretrovirus
|
Factors
|
GALV
|
MuLV
|
KoRV
|
FeLV-A
|
FeLV-B
|
enFeLV
|
C-seq
|
ERV-DC
|
Leukemia
|
|
|
|
|
|
|
|
AML-1
|
1
|
1 (2)
|
1
|
1–2 (4)
|
1
|
0
|
0
|
1
|
LVb
|
1–2
|
1 (2)
|
1 (0)
|
1–2
|
1
|
2 (3)
|
2
|
0
|
Cell cycle
|
|
|
|
|
|
|
|
AP-1
|
1–2
|
0
|
1
|
0
|
0–1
|
2
|
2
|
0
|
AP-3
|
2
|
4
|
2
|
1–4
|
3–4
|
3
|
3
|
5
|
AP-4
|
0–1
|
1
|
0
|
0–1 (4)
|
1
|
0
|
0
|
0
|
NF-1
|
3–4
|
4–8
|
3–4
|
3–4 (11)
|
4
|
5–6 (4)
|
4–5
|
11
|
Ets-2
|
3–4
|
3
|
5
|
0–4
|
3–4
|
1 (5)
|
1
|
6
|
Sp1
|
0–2
|
0
|
0
|
1 (0)
|
1
|
0
|
0
|
0
|
Sp3
|
1–3
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
Innate immunity
|
|
|
|
|
|
|
|
IRF-1
|
2–3
|
1–3
|
1
|
0–1 (3)
|
1
|
3 (4)
|
3
|
2
|
IRF-2
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
IRF-3
|
2–3
|
1–3
|
1
|
0–1 (3)
|
1
|
3 (4)
|
3
|
2
|
STAT1
|
0–2
|
1
|
1
|
0 (2)
|
0
|
1
|
1
|
0
|
STAT1β
|
1–2
|
1 (2)
|
0
|
0
|
1
|
2
|
2
|
2
|
STAT5A
|
4–6
|
4 (5)
|
4
|
0–1 (3)
|
3–4
|
5 (6)
|
5
|
4
|
STAT5B
|
1–2
|
1 (2)
|
1
|
0–1
|
1
|
0
|
0
|
0
|
STAT6
|
2–3
|
1 (2)
|
1
|
1 (3)
|
0–1
|
1
|
1
|
1
|
Adaptive immune response
|
|
|
|
|
|
NF-AT1
|
4–6
|
2–6
|
2
|
0–2 (6)
|
2
|
6 (8)
|
6
|
2
|
NF-AT2
|
2–3
|
1–3
|
2
|
0–2 (4)
|
1–2
|
3 (4)
|
3
|
3
|
NF-AT3
|
1
|
1
|
0
|
0–1
|
0–1
|
1 (2)
|
1
|
0
|
NF-AT4
|
2
|
1 (2)
|
1
|
2
|
2
|
1–3
|
1 (2)
|
1
|
NF-κB
|
5 (1)
|
6 (2, 9)
|
1
|
1 (3)
|
1–2
|
4
|
4
|
1
|
Ets-1
|
4–6
|
6 (7)
|
5
|
6–8 (10)
|
8–9
|
6–8
|
6–7
|
6
|
Hormone stimulation
|
|
|
|
|
|
|
GR
|
0–2
|
3–5
|
0
|
2
|
2–3
|
0
|
0
|
0
|
GR-α
|
4–5
|
8–9 (12)
|
2–3
|
4
|
4
|
3–4 (5)
|
3–4
|
1
|
TATA and CAT boxes
|
|
|
|
|
|
|
C/EBP
|
8–10
|
5–8
|
11
|
3–6
|
6–7
|
6 (7)
|
6
|
12
|
TBP
|
3
|
3
|
4
|
2–3 (4)
|
3
|
2 (0)
|
2 (0)
|
2
|
Total range
|
61–90
|
74–98
|
58–61
|
50–83
|
66–67
|
66–84
|
66–70
|
60
|
Average
|
79.7
|
81.25
|
60
|
62.7
|
66.5
|
71.7
|
68.9
|
60
|
Number of sequences analyzed
|
3
|
4
|
3
|
8
|
2
|
7
|
8
|
1
|
The identity among the eight sequences was quite high (97.0%-99.8%). Blastn analysis showed that they were most closely related to the enFeLV sequences with GenBank Accession Numbers AY364319, LC196053, LC198317, MH270418, MH325047, and MH325035, all of them with percentage of identity above 98.4%. LTRs of C-sequences had a much lower similarity with FeLV-A and FeLV-B sequences (59.63%-61.2 5% and 60.68%-61.06%, respectively), lower than what has been reported for the whole genome (86%, (23)), and with the ERV-DC (45.98%-46.96%) than with enFeLV LTRs (manuscript in preparation). Consequently, C-sequences were considered enFeLV.
Sixty-six to seventy TBS were identified in the C-sequences (Table 3). TBS with a higher presence in the C-LTRs were Ets-1 (six or seven sites), NF-AT1, C/EBP (six sites each), and NF-1 and STATA5A (four or five sites each) (Table 3). TBS not found in the C-sequences LTR were AML-1, AP-4, CREB, GR, IRF-2, Sp1, Sp3, STAT3 and STAT5B. Transcription binding sites were very conserved and polymorphisms affected mostly the surrounding nucleotides of the TBS (Figure S1). However, when compared to other enFeLV, some TBS had been lost through genome variation. For example, enFeLV LC196053 had a total of 84 TBS, while C-sequences had 66–70 TBS, as most other enFeLV (Table 3). The sequence in the feline chromosome D2 had the same number and distribution of TBS as AY364318 and AY364319 (Figure S1). In general, in several cases when changes affected TBS, genetic changes created other TBS, suggesting that the number and type of TBS were important for the biology of either the viral sequence or the host. The integrity of most TBS in enFeLV LTRs including the C-sequences and their great resemblance to those in exFeLV may suggest that their presence is necessary for the virus or the cat.
This prompted us to compare the presence of TBS in different gammaretroviral strains, both exogenous and endogenous. Exogenous FeLV, including FeLV-A (eight sequences) and the FeLV-A/enFeLV recombinant FeLV-B (two sequences), to which enFeLV are closely related, were chosen to determine whether the endogenization process involved modification in the number and type of TBS. Other gammaretroviruses chosen (three or four sequences for each one) were the murine leukemia virus (MuLV), as it has been suggested that enFeLV may be related to MuLV, acquired from the predation of rodents by felids in the ancient past (20), koala retrovirus (KoRV, which may have been recently endogenized (27)), gibbon ape leukemia virus (GALV), and other feline endogenous retrovirus of the domestic cat (ERV-DC, unrelated to enFeLV (12)) (Table 3). Putative TBS were located using the algorithm ALGGEN, which identified sites with some degree of variability.
The type and number of TBS in enFeLV was most similar to that of GALV (66–84 and 61–90, respectively). The number of TBS in KoRV was the lowest of the gammaretroviruses analyzed (58–61).
Analysis of TBS present in the selected LTRs
Since TBS affect different aspects of the cell biology, they will be discussed in groups. However, their effects are pleiotropic, and any one of them is present in promoters of genes which encode proteins which participate in different cellular events.
TATA and CAT boxes
The well-known promoters TATA-box and CAT-box were present in all the C-sequences as TATAA from − 23 to -27 bp and CCAAT from − 84 to -88 bp, except in C-30, which lacked both. These promoter sequences are located in highly conserved areas, as they are necessary to start the activity of the machinery responsible for replication. Of the 36 sequences analyzed, the TATA-box was only missing in C-30 and in AB6444 (feline ERV-DC). Consequently, these two LTR sequences were also missing the TBP (TATA-box binding protein) site, which greatly overlaps the TATA-box. TBP and some additional proteins associated to it constitute the TFIID complex, a general transcription factor that forms part of the pre-initiation complex of the RNA polymerase II, as it positions this enzyme on the start site for gene transcription (28). CCAAT was not as conserved, and it was missing from the ERV-DC. Regardless of the presence or absence of the CCAAT, all 36 sequences analyzed had at least one C/EBP (CCAAT/Enhancer-binding protein), most of them between six and 12 sites (Table 3). Their position in the LTR was very conserved: two or more sites were frequently located in the immediacies of the CAT-box, around − 80 bp (Fig. 3), as expected. C/EBPs play key roles in regulating cellular growth and differentiation through interaction with cell cycle proteins, immune and inflammatory responses, as well as having a specific role within the context of various disease processes (29) and have been described as both tumor promoters and tumor suppressors (30).
Factors related to leukemia
One of the most striking differences between exFeLV and enFeLV (including the C-sequences) was the presence of 1–2 (exceptionally 4: AB060732) AML (acute myeloid leukemia)-1 sites in exFeLV vs none in the enFeLV. AML-1 sites were also present in all exogenous gammaretroviruses. AML is a protein that participates in hematopoiesis and has a primary role in the development of all hematopoietic cell types. On the other hand, the other TBS related to leukemia (LVb, leukemia virus factor b) was virtually present in all gammaretroviruses (except for one KoRV), both exogenous and endogenous, in a well-preserved region. Nevertheless, LVb is thought to induce leukemia in combination with other TBS, 5’-LVb/core/NF1/GRE-3’. This tandem sequence is highly conserved in a large number of murine, feline and primate C-type retroviral enhancers (31), but was missing from endogenous retroviruses, as none of them had a GRE site (see below); thus, probably LVb was inactive in these LTRs, annulling their leukemogenic potential.
Factors related to cell differentiation, proliferation and apoptosis
There are many TBS which participate in cellular processes related to cell differentiation, cell growth, proliferation, apoptosis, response to DNA damage and chromatin remodeling. Many of them were present in variable numbers in the LTRs analyzed.
exFeLV had no (or exceptionally one) AP (activating protein)-1 sites, vs two in enFeLV LTRs. The algorithm ALGGEN did not locate any AP-2 site in any of the sequences, which is surprising as it plays a critical role in regulating gene expression during early development (32). However, using other algorithms AP-2 motif was found to be conserved in FeLV-A (33) and maybe that ALGGEN cannot locate AP-2 correctly. The AP factors regulate gene expression in response to a variety of stimuli, including cytokines, growth factors, stress and bacterial and viral infections (34). No endogenous retrovirus had any AP-4 site (the symmetrical DNA sequence 5’-CAGCTG-3’). AP-4 acts both as a repressor and an activator of different target genes, both viral and cellular, by binding to their E-box sequence in the promoters (35).
Other TBS related to cell growth is the NF (nuclear factor)-1. In the LTRs studied, this site had some mutations, coinciding with what is described in the literature, and it is possible that some of them affect transcription in a cell-specific way (36). It was very abundant in all gammaretroviruses with 4–5 representations in each virus, and one of the FeLV-A LTRs (AB060732) had 11. It was also numerous in some of the MuLV (Table 3). Most of the sites present at -150 bp or less were in the close neighborhood of C/EBP and TBP sites, suggesting a collaboration of these sites in transcription.
The Sp family of transcription factors binds to GC/GT rich motifs of many promoters to regulate the expression of multiple genes involved in many cellular processes, from differentiation to apoptosis (37). ALGGEN located this TBS in exFeLV (in accordance with data from (33)) and in GALV. In exFeLV, Sp1 and C/EBP sites were over 60 bp apart, which may abrogate the possibilities of interaction between them (38).
Ets-2 transcription factor regulates genes involved in development and apoptosis. In the LTRs analyzed Ets-2 was absent only from two FeLV-A sequences, and was most abundant in KoRV (five sites), while all the other LTRs had 1–4 (Table 3).
Factors related to response to interferon and other innate mechanisms
Interferon is a molecule of the innate immune response that has several effects. Inducible expression of the type I interferon (IFN-I) genes is controlled through multiple TBS within these genes, the interferon regulatory factors or IRFs and the interferon stimulated response elements or ISREs (reviewed in (39, 40)), which function as enhancers to promote transcriptional induction by alpha/beta interferons (IFN-I). IRF-1 and − 3 were well-represented in most LTRs analyzed. The highest number was observed in the endogenous retroviruses (enFeLV), while the lowest in their exogenous counterparts (Table 3). ISRE was not searched for in this work.
The family of proteins STAT (Signal Transducer and Activator of Transcription) also participate in processes of proliferation, immunity, apoptosis and cell differentiation (41). We have included them in this group of TBS related to innate immunity as STAT1 is involved in IFN-I signaling (42). All strains had at least one STAT1 or STAT1β site (Table 3). TBS that recognize STAT5, involved in hematopoiesis and with a role in leukemia (43), were very numerous in most viruses; exFeLV, which produce lymphoma and leukemia in a high percentage of animals (44), had the lowest number of this TBS (Table 3).
Factors related to adaptive immune response
The number of NF-AT (nuclear factor of activated T-cells) sites was radically different between the viruses studied. Six to eight NF-AT1 sites were present in enFeLVs vs none to two in exFeLV, though one FeLV-A (MF681672) had six. NF-AT2 was also higher in enFeLV (three or four sites) than in exFeLV (no or two in exFeLV, though MF681672 had four). Differences in the number of NF-AT3 and NF-AT4 were not so marked (Table 3). NF-AT is a family of transcription factors important in the immune response. They were first discovered as an activator for the transcription of interleukin-2 (IL-2) in T-cells, a regulator for T-cell response, but has since been found to participate in the regulation of many other body systems (45). NF-AT is known to bind its site in a cooperative complex with AP-1, and the activation of both is known to trigger the genes for cytokines and chemokines, required for a productive immune response (46). NF-AT and AP-1 are close enough to cooperate in GALV, KoRV, one FeLV-B (MH116005) and enFeLV.
NF-κB (nuclear factor kappa B) also regulates the immune response and it is involved in the cell response to many stimuli, such as stress, cytokines, UV rays, and viral and bacterial antigens (47). It is an enhancer for the synthesis of the kappa light chains of the immunoglobulins by B-cells, and also regulates genes involved in the development, maturation and proliferation of T-cells (48). The LTRs analyzed exhibited big differences between them. All gammaretroviruses had at least one NF-κB, enFeLV had 4–5 NF-κB, but exFeLV only had 1–2 (Table 3).
Ets-1 is expressed at high levels mainly in tissues related to the immune response such as thymus, spleen and lymph nodes and may block differentiation of B- and T-cells (49, 50). The presence of the Ets-1 TBS as identified by ALGGEN was numerous but very variable, even within different LTRs of the same species.
Factors related to hormone stimulation
The retroviruses with the highest number of glucocorticoid response element (GR) and GR-alpha TBS were MuLV (Table 3). The relationship between retroviruses and glucocorticoids (GC) was first reported when it was discovered that GC stimulated the expression and budding of mouse mammary tumor virus (MMTV) (51). MuLV, a gammaretrovirus, is distantly related to MMTV, a betaretrovirus, but it is interesting that MuLV, as MMTV, had an overrepresentation of GR when compared to the other viruses studied. GR also responds to progesterone, androgens and mineralcorticoids (52), so the array of situations which may affect MuLV expression through this site is notable, and are associated to development, metabolism and even immune response. exFeLV also had two or three GR sites, vs none in enFeLV (Table 3). The similarity between the number of GR sites in MuLV and in exFeLV again may represent the acquisition of an ancestral rodent virus by a felid predecessor (20), but which may have only endogenized when mutations in this site would have abrogated its function.
Differences in the U3 regions of the LTRs of enFeLV and exFeLV
The conservation of these regions in enFeLV does not mean that the TBS are functional, but it would highlight their importance in the transcription machinery of these viruses. Some of the mechanisms that may abrogate the functionality of TBS are the methylation and subsequent deamination of CpG dinucleotides, which alter the binding sites of transcription factors (53), mutations and deletions (54). A TBS search with ALGGEN of a 26-nt and 13-nt deletions in all the C-sequences and the sequence in feline chromosome D2 compared to AY364318 and AY364319, respectively, and a 21-nt deletion also present in the same area of exFeLV compared to AY364318 (Fig. 1S) indicated that this area did not contain TBS. Thus, the indel had occurred in an area of the LTR potentially irrelevant for transcription. Because the C-30 sequence was the least homologous to the other enFeLV sequences (Fig. 1S), we studied the presence of its TBS separated from the other C-sequences. The analysis showed that the TBS were very similar to those in the other C-sequences, except that an NF-1 site had disappeared in C-30 that in all other C-sequences is located from nucleotide − 303 to -308.
Distribution of TBS within the U3 regions
When analyzing the distribution of TBS in the different LTRs, "hot spot" regions with six or more overlapping TBS were detected in all of them (Figs. 2 and 3). For example, the C-sequences and the other enFeLV had three of those clusters, at nucleotides − 372 to -365 (cluster of 10 TBS), at -313 to -305 (cluster of 12 TBS), and at -295 to -290 (cluster of 12 TBS). All three clusters included many TBS related with innate and adaptive immunity. The clustering of TBS may be a consequence of cooperation between sites as stated above for the LVb/core/NF1/GRE and NF-AT/AP-1.
In conclusion, a similar number and type of TBS were found in feline enFeLV (including 8 C-sequences from Spanish cats and an LTR-like sequence in the feline chromosome D2) and exFeLV LTRs. The study benefits from the existence of enFeLV, very closely related to exFeLV, making comparisons between both very interesting. The integrity of most TBS in enFeLV LTRs and their great resemblance to those present in exFeLV may suggest that they may work as regulatory elements in cis to control the expression of multiple genes. A similar number and type of TBS were also found in other gammaretroviruses that affect different species (GALV, MuLV, KoRV). A high degree of conservation in some TBS was observed in most of the viruses studied, even in the endogenous forms, which could be related to the biology and the replication strategies of these viruses or to the evolutionary adaptation to the cellular host. However, the presence of a particular TBS in the endogenous retroviral sequences does not necessarily mean that they can respond to cellular factors or signals or could even represent a potential risk in case of infection with exogenous retrovirus homologs. To determine whether the TBS of endogenous sequences are actually functional, additional analyses should be performed studying the expression of messenger RNA, from which the functional protein would be synthesized, after stimulating the host cell or after cloning in a vector.