Phylogenetic analysis of these 4 sequences showed that they did not cluster with any known HIV-1 subtypes/CRFs and formed an independent cluster with high bootstrap value (>90%), indicating that they may belong to a novel genotype (Fig. 1A). And then both the BLAST and recombinant analysis revealed that these strains comprised of HIV-1 CRF01_AE and CRF07_BC segments may be a new CRF with 5 identical breakpoints: 3 CRF01_AE and 3 CRF07_BC segments (Fig. 1B).
Recombination breakpoints were displayed as follows: ⅠCRF07_BC (790-1817nt), ⅡCRF01_AE (1818-2022nt), ⅢCRF07_BC (2023-5759nt), ⅣCRF01_AE (5760-6166nt), ⅤCRF07_BC (6167-8367nt), ⅥCRF01_AE (8368-9411nt) using HXB2 as a reference (Fig. 1C). As shown in Fig. 2, further subregional phylogenetic analysis results indicated that possible parental lineages of CRF07_BC segments (Ⅰ, Ⅲ, and Ⅴ) belonged to the CRF07_BC MSM cluster. CRF01_AE segments (Ⅳ and Ⅵ) also belonged to MSM Cluster (CRF01_AE Cluster 5)[14]. But segment Ⅱ may be too short or insufficient gene similarity, only 13SJ011 sample was located in CRF01_AE Cluster4 (MSM Cluster), other sequences form a cluster by themselves among CRF01_AE Cluster. Subregional phylogenetic analysis revealed that all recombinant segments are closely related to the clusters associated with the sexually transmitted population, especially the MSM population.
Bayesian analysis was performed to estimate the time to the most recent common ancestor (tMRCA). Only the CRF01_AE segments (Ⅱ, Ⅳ, and Ⅵ) of the two NFLGs (LS14873 and LS16846) were selected for the next time-evolution analysis, because the only CRF01_AE segment (Ⅱ) the other two non-NFLG sequences (13SJ011 and LS15083) contain is too short to be detected sufficient time signal by the TempEst software (R-square<0.5). Similarly, we selected Ⅰ(HXB2:790-1817nt) and Ⅲ (HXB2: 2023-3455nt) segments of CRF07_BC recombination segments to do the Bayesian analysis, for the reasons that all sequences include the 2 segments and the sequence set of these 2 segments reached enough time signals (R-square >0.7). Maximum clade credibility (MCC) trees were reconstructed based on the recombinant regions of CRF01_AE segments (Ⅱ, Ⅳ, and Ⅵ) and CRF07_BC segments (Ⅰ:790-1817nt and Ⅲ: 2023-3455nt) respectively. CRF01_AE and CRF07_BC sequences with sampling years were both downloaded from the LANL HIV Sequence Database and obtained from the HIV-infected blood samples collected by our laboratory. All these sequences were subjected to the phylodynamics analysis together with the target sequences in this study. As shown in Fig. 3, the tMRCA of novel recombinant strains for CRF01_AE regions (Ⅱ, Ⅳ, and Ⅵ) and CRF07_BC regions (Ⅰ and Ⅲ: 2023-3455nt) were estimated around 2009.3 (95% highest probability density (HPD): 2007.0-2011.2) and 2011.7 (95%HPD: 2010.8-2012.5), respectively. However, a sequence (Patient ID: 13SJ011) in the CRF07_BC region was estimated to be around 2008.2 (95%HPD: 2007.4-2009.0) and not clustered with other three sequences obtained in Shenzhen, which may be the earlier ancestor of the three Shenzhen sequences, or it may be of distinct origin but sharing the same recombination breakpoints with others. The sequence is too short for us to know for sure. Since all patients have no direct epidemiological relationship and the strains that they infected share the same mosaic structure, according to the HIV-1 CRF nomenclature proposal[15], three recombinant strains in Shenzhen are recognized and included by the HIV database and designated as the new CRF (CRF120_0107). Together with the fact that CRF120_0107 has not previously been described in other countries/regions, the results inferred that CRF120_0107 placed its emergence in Shenzhen approximately between 2009-2011 (Fig. 3).
With the rapid economic development and large population flow base, Shenzhen has great potential for the spread of HIV locally and even across the country. The origin and outbreak of CRF55_01B among MSM in Shenzhen has changed HIV-1 molecular epidemiology patterns in the region, and perhaps across the nation[16, 17]. In addition, since the discovery of CRF01_AE/CRF07_BC recombinants, the MSM population has contributed to the emergence of most URFs _0107 and CRFs_0107 Second-generation CRFs like CRF120_0107 have been identified especially among the MSM population, which has become the special high-risk group of the form and prevalence of the recombinants. At present, it is obvious that the advantages of these strains in the generation and spread among MSM in the region may be the accumulation of power for its high transmission in the future. At the same time, the generation of HIV-1 second-generation recombinant forms (CRF01_AE/CRF07_BC) also acts as an evolutionary force to increase HIV diversity and complexity in this region.