Genetic Characterization and Recombinant History of a Novel Emerging HIV-1 Second-Generation Circulating Recombinant Form (CRF120_0107) Identied Among Sexually Transmitted Infections in Shenzhen, China

Under the background of the main epidemic HIV strains (CRF01_AE and CRF07_BC) co-circulation in China, more HIV second-generation recombinant (SGR) strains with CRF01_AE and CRF07_BC as the backbone are emerging. In this study, we analyzed the characteristics and evolutionary history of a newly emerging HIV-1 CRF120_0107 composed of CRF01_AE and CRF07_BC based on the near full-length genome (NFLG) in Shenzhen, Guangdong Province, China. NFLG phylogenetic analysis revealed that these sequences formed a distinct monophyletic branch with a high bootstrap value (>90%), distantly related to all known HIV-1 genotypes. Recombination analysis showed that CRF120_0107 was composed of the predominant HIV-1 strains in China: CRF01_AE and CRF07_BC. Further subregional phylogenetic analysis was performed that possible parental lineages of CRF07_BC segments ((cid:0), (cid:0), and (cid:0)) belonged to the CRF07_BC men who have sex with men cluster (MSM cluster), other CRF01_AE segments also mainly belonged to MSM Cluster (such as CRF01_AE Cluster 5). Bayesian analysis results inferred that CRF120_0107 placed its emergence in Shenzhen approximately between 2009-2011. The appearance of CRF120_0107 further highlights that more and more HIV-1 SGR strains containing CRF01_AE and CRF07_BC will be more generated frequently and will most likely be more conducive to accelerating the spread of HIV in China. This highlighted it is necessary to monitor MSM high-risk individuals with HIV-1 CRF01_AE and CRF07_BC dual infection to prevent the generation of CRF01_AE/CRF07_BC recombinant strains, thus reducing the possibility of HIV-1 genotype resistance and the complexity of treatment in China.


Background
The co-circulation of multi-strains of the human immunode ciency virus (HIV) facilitates the generation of various inter-genotype unique recombinant forms (URFs) and novel circulating recombination forms (CRFs) [1] which will have an important impact on HIV genetics and evolution. The main characteristics of HIV are extremely high genetic variation and high-speed replication ability. Up to now, 118 CRFs and abundant URFs have been reported worldwide constantly and all of them have been included in the Los Alamos National Laboratory (LANL) HIV Sequence Database (http://www.hiv.lanl.gov/). Similarly, in China, the distribution of HIV-1 genotypes has changed dramatically over time, and especially two CRFs, CRF01_AE and CRF07_BC, have led to a national or regional HIV-1 pandemic in recent years [2].
Shenzhen, as one of the four central cities in the Guangdong-Hong Kong-Macao Greater Bay Area (GBA), is located along the coast of Guangdong Province in South China, where the epidemic trend of HIV-1 has become increasingly complicated. It has been found that since 2010, the most prevalent strain among men who have sex with men (MSM) infected with HIV-1, CRF01_ AE has been replaced by CRF07_ BC in Shenzhen [3]. In addition, novel HIV-1 second-generation recombinant (SGR) strains with CRF01_AE or CRF07_BC as the backbone have also been discovered in Shenzhen [4], suggesting that CRF01_AE and CRF07_BC are undergoing higher frequency recombination during the HIV-1 epidemic, and adaptive mutation and evolution occurred constantly. In this study, we analyzed the characteristics and evolutionary history of a newly emerging HIV-1 CRF120_0107 composed of CRF01_AE and CRF07_BC in Shenzhen, Guangdong Province, China.

Methods
Three patients (Patient ID: LS14873, LS15083, and LS16846) infected with the novel HIV-1 recombinant strains were included in this study. All participants signed written informed consent. They were infected through sexual transmission in Shenzhen, China. All of them were males with ages between 31 and 38 years old. There was no direct epidemiological linkage among these individuals. The demographic and epidemiological data of these patients are shown in Table 1.

Results And Discussion
Phylogenetic analysis of these 4 sequences showed that they did not cluster with any known HIV-1 subtypes/CRFs and formed an independent cluster with high bootstrap value (>90%), indicating that they may belong to a novel genotype (Fig. 1A). And then both the BLAST and recombinant analysis revealed that these strains comprised of HIV-1 CRF01_AE and CRF07_BC segments may be a new CRF with 5 identical breakpoints: 3 CRF01_AE and 3 CRF07_BC segments (Fig. 1B).
Bayesian analysis was performed to estimate the time to the most recent common ancestor (tMRCA).
Only the CRF01_AE segments ( , , and ) of the two NFLGs (LS14873 and LS16846) were selected for the next time-evolution analysis, because the only CRF01_AE segment ( ) the other two non-NFLG sequences (13SJ011 and LS15083) contain is too short to be detected su cient time signal by the TempEst software (R-square<0.5). Similarly, we selected (HXB2:790-1817nt) and (HXB2: 2023-3455nt) segments of CRF07_BC recombination segments to do the Bayesian analysis, for the reasons that all sequences include the 2 segments and the sequence set of these 2 segments reached enough time signals (R-square >0.7). Maximum clade credibility (MCC) trees were reconstructed based on the recombinant regions of CRF01_AE segments ( , , and ) and CRF07_BC segments ( :790-1817nt and : 2023-3455nt) respectively. CRF01_AE and CRF07_BC sequences with sampling years were both downloaded from the LANL HIV Sequence Database and obtained from the HIV-infected blood samples collected by our laboratory. All these sequences were subjected to the phylodynamics analysis together with the target sequences in this study. As shown in Fig. 3 and not clustered with other three sequences obtained in Shenzhen, which may be the earlier ancestor of the three Shenzhen sequences, or it may be of distinct origin but sharing the same recombination breakpoints with others. The sequence is too short for us to know for sure. Since all patients have no direct epidemiological relationship and the strains that they infected share the same mosaic structure, according to the HIV-1 CRF nomenclature proposal [15], three recombinant strains in Shenzhen are recognized and included by the HIV database and designated as the new CRF (CRF120_0107). Together with the fact that CRF120_0107 has not previously been described in other countries/regions, the results inferred that CRF120_0107 placed its emergence in Shenzhen approximately between 2009-2011 (Fig. 3).
With the rapid economic development and large population ow base, Shenzhen has great potential for the spread of HIV locally and even across the country. The origin and outbreak of CRF55_01B among MSM in Shenzhen has changed HIV-1 molecular epidemiology patterns in the region, and perhaps across the nation [16,17]. In addition, since the discovery of CRF01_AE/CRF07_BC recombinants, the MSM population has contributed to the emergence of most URFs _0107 and CRFs_0107 Second-generation CRFs like CRF120_0107 have been identi ed especially among the MSM population, which has become the special high-risk group of the form and prevalence of the recombinants. At present, it is obvious that the advantages of these strains in the generation and spread among MSM in the region may be the accumulation of power for its high transmission in the future. At the same time, the generation of HIV-1 second-generation recombinant forms (CRF01_AE/CRF07_BC) also acts as an evolutionary force to increase HIV diversity and complexity in this region.

Conclusion
In this study, we rstly identi ed and analyzed the characteristics of the newly emerging HIV-1 CRF120_0107 composed of CRF01_AE and CRF07_BC in Shenzhen, China, and analyzed its evolutionary history.  Subregion phylogenetic analysis based on CRF01_AE or CRF07_BC segments of HIV-1 CRF120_0107. Subregion neighbor-joining tree was constructed by the Kimura 2-parameter model of nucleotide substitution with 1000 bootstrap replicates in MEGA v6 and visualized by iTOL. Region , , and represent CRF07_BC segments, region , , and represent fragment CRF01_AE segments. Subtype A-D, F-H, J-L, CRF01_AE, CRF07_BC, and HIV-1 O group were used as references. The CRF01_AE or CRF07_BC segments sequences of CRF120_0107 and suspected CRF120_0107 sequence (13SJ011) were marked with ''■'' and the corresponding sequence labels were also marked with a cyan and pink background respectively. Only bootstrap values≥90% were presented at the nodes of the tree marked with a light purple solid circle. Evolution analysis based on CRF01_AE or CRF07_BC segments of HIV-1 CRF120_0107. MCC trees show recombinant fragments of CRF01_AE regions ( , , and ) and CRF07_BC regions ( :790-1817nt and : 2023-3455nt). Timescale is shown below the tree. The mean tMRCA and 95% highest probability density (HPD) for the CRF120_0107 and suspected CRF120_0107 sequence (13SJ011) nodes are displayed. CRF120_0107 strains and suspected CRF120_0107 strain (13SJ011) are highlighted cyan for the CRF01_AE region and pink for the CRF07_BC region.