Emergence of HIV-1 C/A1 and C/A1/D circulating recombinant forms, and dominance of subtype C and R5 use from whole genome sequence analysis in Addis Ababa

The HIV pandemic in Ethiopia is dominated by subtype C with sporadic A and D epidemiology. The presence of subtypes A and D may result in emergence of recombinant viruses, and increase the genetic diversity that makes monitoring the HIV epidemic, and the development of vaccines and therapeutics difficult. This study is aimed at determining subtypes, circulating recombinant forms (CRFs), and the dominant coreceptor use in Addis Ababa, Ethiopia. Participants with a range of purposely selected CD4+ T-cell counts were included. Chi-square and Mann-Whitney tests were used. Whole genome next-generation sequencing (NGS) of HIV was performed using a PCR amplification method and Illumina MiSeq. Subtyping and scanning of recombination were done by the REGA subtyping tool version 3.0. Prediction of coreceptor usage was performed using Geno2Pheno clonal-model and PhenoSeq. Signature amino acids and positive charges were also used in the tropism prediction. Phylogenetic analyses were conducted with MEGA version 6 using maximum likelihood with the neighbor-joining (N-J) methods. The study confirmed that the dominant subtype in Addis Ababa is HIV-1 subtype C. In addition, HIV-1 subtype A1, CRFs C/A1 and C/A1/D were also identified. The dominance of R5-tropic viruses was detected and these were associated with a higher CD4 T-cell count and lower viral load. Further studies on HIV subtypes and CRFs will be essential to fully understand HIV/AIDS epidemiology. In addition, the tropism information is important in Ethiopia if the use of the co-receptor antagonist maraviroc is planned.


Introduction
Current HIV prevalence data for Ethiopia [1], estimate there are 610,335 people living with HIV (PLHIV) with the adult HIV prevalence being around 1%. The HIV epidemic in Ethiopia is dominated by subtype C [2,3,4], which also accounts for more than 50% of the global pandemic [5]. However, sporadic infections with other subtypes, A and D have also been reported [2,3]. Subtypes A and D predominate in East and central African countries such as Kenya, Uganda, and Tanzania [6,7,8]. In Djibouti subtypes A and C are reported [9]. Although effort was made to detect the existence of recombinants by undertaking the first full length subtype C sequence from a 1986 Ethiopian sample [10], the first evidence of subtype A/C recombinant from a 1991 Addis Ababa sample was reported in 1998 by Sherefa et al. [11].
Genetic subtypes may differ in important biological properties such as virulence, tissue tropism and transmissibility [12]. Ethiopian patients with HIV-1 subtype C harbor a remarkably low frequency of syncytium inducing (SI) CXCR4 phenotype viruses [13]. There is a strong correlation between the viral tropism and progression to disease [14,15]. Nonsyncytium inducing (NSI) HIV-1 strains use primarily CCR5 computing with -chemokines including regulated on activation, normal T expressed and secreted (RANTES), macrophage inflammatory protein-1α or 1β (MIP-1 and MIP-1) receptor, while SI strains use CXCR4 in comptetition with -chemokines, for example, stromal differentiating factor 1 (SDF-1) [16]. For prediction of coreceptor usage, different bioinformatics algorithms are developed [17,18] and combining the presence of lysine and argenine amino acids at positions 11/24/25 and the net charge of V3 tested for HIV-1 is also used [19].
Those individuals who are homozygous for CCR5 delta32 deletion are relatively resistant to HIV infection, which makes CCR5 one of the therapeutic targets. However, those individuals who are heterozygous for the deletion have reduced expression of CCR5 and have slower declines in the CD4 T cell count and slower progression to AIDS [20,21,22].
The CCR5 antagonist Maraviroc has been previously used for salvage therapy in those who have failed first and second-line treatment regimens, however it has also been evaluated as an antinflammatory agent with potential use in liver steatosis and cognitive impairment with positive results [23,24]. Therefore, generating information on the type of the viral strain dominant in people living with HIV is important to determine the feasibility of CCR5 antagonist use in the Ethiopian context.
The cause of HIV diversity is mainly accumulation of point mutations introduced by the error prone HIV-1 reverse transcriptase during replication [25,26]. This is amplified by the high rate of replication the virus has, where about 10 10 virions are produced each day thereby increasing the rate of error introduction [27]. The most strongly conserved residues in the V3 loop are the two-cysteine residues, GPGX motif at the tip of the V3, and the n-linked glycosylation site adjacent to the first cysteine residue. The number of charges and glycosylation in the V3-loop can affect cellular and neutralization abilities of antibodies [28]. The extensive genetic diversity of the variants within an individual overtime and the emergence of recombinant viruses have made the development of medical interventions much more difficult. This may also enable HIV both to overcome the immune response and to develop resistance to antiviral agents. In addition, it makes difficult development of vaccine(s), diagnostics and therapeutics [29,30,31]. An indepth study of HIV genetic variation and classification of subtypes together with better understanding of its circulating recombinant forms (CRFs) and other recombinant genomes would be necessary for monitoring the HIV/AIDS epidemics and understanding its epidemiology. Therefore, this study was aimed at determining the HIV-1 subtypes and the dominant coreceptor tropism of the viruses circulating in Addis Ababa.

Study setting, design and population
A whole genome sequencing of HIV-1, on selected 60 samples taken from drug naïve 594 cross sectional study participants [32,33], was done to determine subtypes, CRFs and the tropism and glycosylation site diversity. The samples were collected from four hospitals participating in the study, Addis Ababa, Ethiopia. These included the All African Leprosy Rehabilitation and Training Centre (ALERT), and Saint Paul, Yekatit-12 and Zewditu hospitals. Samples were selected based on their CD4 T cell count and WHO stage categories <200 and 200-349 with WHO stages 1-4, and 350-500 and >500 cells/uL with WHO stages 1-2. Study participants were selected from a treatment-naïve HIV-positive cohort in Addis Ababa, Ethiopia. Participants were included if they had been followed up in clinic for at least three years and had a stored plasma sample with HIV-1 RNA >1,000 copies/mL available for sequencing. We were able to get those study participants ART naïve because test and treat was not started yet. Ethical approval for the study was HIV-1 RNA amplification and sequencing: RNA was extracted and purified from the plasma samples with HIV RNA load. The extracted RNA was reverse transcribed and the complementary deoxynucleic acid (cDNA) was processed by the polymerase chain reaction-amplicon (PCR-amplicon) method to create a template library. Next generation sequencing (NGS) was performed using an Illumina MiSeq. The sequences were generated using a PCR method with four overlapping amplicons spanning the whole genome [34].
Genome assembly: Quality control checks and trimming were done on the raw reads, followed by assembly of consensus sequences and mapping of reads onto the consensus for variant calling. Reads that were low quality and below a minimum length (50 bases) were trimmed and aligned to both human and HIV genomes. Those HIV raw reads that aligned with the human genome were discarded to avoid contamination. After this preparation, NGS raw reads were assembled into genomes using the iterative viral assembler (IVA) method [35]. The contigs produced using IVA were aligned to an HIV sequence database which contains full length HIV genomes in order to select reference to fill gaps. The contigs were mapped and aligned to this selected reference genome and the draft genomes were constructed. Then, gap filling was done by aligning the good quality reads onto the draft genome, and replacement of bases from the reference with those from the reads. Finally, gap filling was repeated for a maximum of 10 times to get the final consensus genome [35].
Assembled sequence analysis: Consensus sequence analysis was done to identify subtypes, CRFs, tropism and resistance mutations. Subtyping and scanning of recombination was done by the REGA subtyping tool V3.0 [36]. The V3 loop sequence was derived by gene cutter program from the whole genome [37]. Prediction of coreceptor usage was performed using Geno2Pheno clonal-model [17] and PhenoSeq [18]. In the 35 amino acids V3 loop, predicted positive charges <5.0 suggest that the virus is macrophage tropic, whereas positive charges 5.0 suggest that the virus is T cell tropic [19]. Lysine and arginine amino acids at positions 11, 24, and 25 in the V3 loop are defined and used as signature amino acids for the determination of SI and NSI phenotypes [38]. By using Geno2Pheno, a false positive rate (FPR) below 10% was considered as X4tropic strains. PhenoSeq was used to identify the virus as X4 using or non-X4 using.
Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 6 [39] with inclusion of reference sequences from A, C, and D subtypes and recombinants of these subtypes. The tree was generated using maximum likelihood with the neighborjoining (N-J) methods with 1000 bootstrap replications. Stanford HIV Drug Resistance Database was used to detect drug resistance mutations [40].  Subtyping and circulating recombinant forms Among the total 60 HIV whole genomes sequenced, 49 were subtype C (81.7%), one was subtype A1 (1.7%) six were recombinant C/A1 (10%) and three were recombinant C/A1/D (5.0%) were identified. One of the sequences could not be assigned to a subtype by REGA (Figure 1 & 2; Table 2). The phylogenetic tree generated by the N-J method indicated that all viruses of 58 sequences belonged to subtype C, except for the virus from subject AL-062 who was infected with HIV-1 subtype A. The branch length in the phylogenetic tree between sequences of plasma was not the same in all subjects as indicated in the tree ( Figure 2). Therefore, all three of the samples (AL-108, SP-065 and ZM-028) carried X4 tropic viruses.

Discussion
In this study, subtype C was the dominant subtype identified. In addition, subtype A1, circulating recombinant forms C/A1 and C/A1/D were also reported. This finding is in concordance with other studies that indicated subtype C is the dominant HIV-1 variant circulating in Ethiopia [4,41,42]. But, sporadic infections with other subtypes, A and D have also been reported [2,3,19]. The reporting of CRFs is in this study is an indication of the emergence of recombinants of the predominant subtypes A, C and D circulating in East African region. The emergence of such recombinants is plausible considering the frequently reported subtypes in Ethiopia and the possibility of influx of other subtypes from the neighboring countries [6,7,8,9].
This study also showed that there is a dominant use of CCR5 coreceptor. This finding is also in concordance with other studies that showed subtype C differs from the other subtypes by its lack of ability to use coreceptors other than CCR5 [13,43,44], and HIV-1 subtype C uses CCR5 coreceptor for cell entry frequently even in patients with advanced immunodeficiency [45,46,47]. Preferential transmissibility of certain NSI isolates compared with more pathogenic SI isolates may be one explanation for this finding.
However, it is also possible that the primary immune response after HIV infection of an individual might be more efficient in eliminating SI viruses than in eliminating NSI viruses [38]. The other reason may be the differential expression of RANTES, MIP-1α and MIP-1β that compete with HIV for access to cell surface CCR5, and SDF-1 by competing with CXCR4 [16,48]. Large differences were detected in determining the coreceptor usage of the subtype C in the bioinformatics tools used in this study. This discrepancy is likely due to the use of different statistical models in how to handle insertions, deletions and ambiguous positions [49].
In this setting, HIV-1 C subtype R5-tropic viruses predominate. Those with X4-tropic infections were more likely to have lower CD4+ cell counts and higher viral loads.
Positively charged signature amino acids at positions 11, 24, and 25 [38], the net positive charge ≥5.0 [19], and the potential N-linked glycosylation site within the V3 loop [28] are predictive markers for T cell tropism of the viral isolates. The net charge of the V3 loop and the lack of positively charged amino acids at positions 11, 24, and 25 indicated that almost all study subjects carried NSI viruses [38]. This finding will have clinical relevance under the circumstance when the CCR5-receptor antagonist maraviroc is decided for use in Ethiopia.
The amino acid changes in the charged V3 loop that determines cellular tropism and glycosylation differences that result in escaping from its recognition by neutralizing antibodies in the V3 loop can be the result of different immune pressure or differences in coreceptor usage [28,50]. This may affect the transmission of the virus and leads to disease progression in the presence of neutralizing antibodies [14,15]. In addition, the branch length in the phylogenetic tree between sequences was different that indicates the high genetic diversity within the dominant subtype C. This indicates that there is need of considering this genetic diversity in the development of vaccines. These differences could also be the factors responsible for viral escape to immunity and responsible to challenges in the development of efficacious vaccine(s). The predicted phenotypes using bioinformatics tools, signature amino acids and net positive charge confirm the low prevalence of CXCR4 usage. This is observed in HIV-1 subtype C from Ethiopian AIDS patients in some studies in contrast to other HIV-1 subtypes [13,43,44].
In conclusion, the epidemic in Addis Ababa is still dominated by HIV-1 subtype C. In addition, HIV-1 subtype A1, circulating recombinant forms C/A1 and C/A1/D are also identified. Therefore, continuous studies on HIV genetic variation, subtypes and CRFs will have paramount importance to understand HIV/AIDS epidemiology, vaccine design, and detection of genetic determinants related to a particular HIV. Furthermore, the dominance of R5-tropic viruses was also detected. This is important in Ethiopia if the use of the coreceptor antagonist maraviroc is planned for use in the future with a high FPR% to decrease the risk of using the CCR5-antagonist maraviroc in patients with X4 virus.  Neighbor-joining tree demonstrating the evolutionary relationship and the distance of the HIV-1 genome consensus sequences. Sixty sequences from plasma samples, subtype C, Subtupe A1, Subtype D and AC, AD, CD, ACD circulating recombinant forms as reference sequences from the Los Alamos database were used. The scale bar represents a genetic distance of 2%.