HIV CRF07_BC is Independently Associated With Slower Disease Progression Than Subtype B in Chinese Patients: A Retrospective Observational Cohort Study

HIV subtypes convey important epidemiological information and possibly inuence the rate of disease progression. In this study, HIV disease progression in patients infected with CRF01_AE, CRF07_BC, and subtype B was compared in the largest HIV molecular epidemiology study ever done in China. A national data set of HIV pol sequences was assembled by pooling sequences from public databases and the Beijing HIV laboratory network. Logistic regression was used to assess factors associated with the risk of AIDS at diagnosis ([AIDSAD], dened as a CD4 count <200 cells/µL) in patients with HIV subtype B, CRF01_AE, and CRF07_BC. Of the 20,663 sequences, 9,156 (44.3%) were CRF01_AE. CRF07_BC was responsible for 28.3% of infections, followed by B (13.9%). In multivariable analysis, the risk of AIDSAD differed signicantly according to HIV subtype (OR for CRF07_BC vs. B: 0.46, 95% CI 0.39 ─ 0.53), age (OR for ≥ 65 years vs. <18 years: 4.3 95% CI 1.81 ─ 11.8), and transmission risk groups (OR for man who have sex with man vs. heterosexual: 0.67 95% CI 0.6 ─ 0.75). These ndings suggest that HIV diversity in China is constantly evolving and gaining complexity. CRF07_BC is less pathogenic than subtype B, while CRF01_AE is as pathogenic as B.


Introduction
China has a slowly increasing HIV epidemic, with 64,170, 71,204, and 63,154 new cases in 2018, 2019, and 2020, and 818,360 individuals living with HIV at the end of 2020. 1 During the rst two decades of the epidemic (1985─2005), most HIV cases were concentrated in the injection drug users (IDU [44.2%]) and former blood donors (29.6%), but since 2006, there has been a clear expansion of the HIV cases in heterosexuals and men who have sex with men (MSM). In 2019, heterosexuals and MSM accounted for 73.8% and 23.3% of new diagnoses, respectively, with IDU accounting for 3.4%. 2 Understanding the increase in HIV diversity within China is not only of epidemiological interest but also has far-reaching clinical implications. [3][4][5][6][7][8][9] One of the fascinating ndings concerning the HIV subtype in China is the belief that CRF01_AE progresses faster than CRF07_BC. [10][11][12][13][14] However, these studies were limited by small sample sizes and failed to adjust for important confounding factors. Worldwide, consistent ndings regarding the rates of disease progression among different HIV subtypes in descending order are subtype C > D > CRF01_AE > G > A. [15][16][17][18][19][20][21][22][23][24][25][26] Although subtype B is the most studied because of its predominance in North America and Europe, it is absent in this comparison chain.
When comparing subtype B with non-B strains using non-B as comparator, it is assumed that all subtypes except for B progress equally, which is obviously not the case. To date, no previous studies have been su ciently large to directly compare subtype B with other single subtypes. [15][16][17][18][19][20][21][22][23][24][25][26] The latest national HIV epidemiology study in China was conducted in 2006 and was published in 2012. 4 Fourteen years have passed, and the China's epidemic has changed. In this study, HIV disease progression was compared between patients infected with subtype B, CRF01_AE, and CRF07_BC in the largest HIV molecular epidemiology study ever conducted in China.

Study population and design
The study population consisted of two separate populations of HIV-infected individuals. The rst group comprised all patients with the HIV transmitted drug resistance (TDR) genotype, performed between 2001 and 2020 at the Beijing HIV laboratory network (BHLN). BHLN is a national collaboration engaged in surveillance of HIV TDR in China. 29,30 These methods have been previously described. Brie y, approximately 40% of the samples from all individuals newly diagnosed with HIV infection by BHLN between 2001 and 2020 were randomly selected. 29, 30 The BHLN takes part in maintaining the national HIV epidemiology database, which tracks everyone who receives a diagnosis of HIV infection in China and records the baseline CD4 count of all individuals with newly diagnosed HIV infection. The baseline CD4 count was the value from their CD4 count closest to the date on which their HIV infection was con rmed by western blot within one year. Baseline demographic data on sex, age, ethnicity, Hukou province, and the transmission risk group were retrieved from this database.
The second group included publicly available sequences from the Los Alamos HIV sequence database. All the pol sequences sampled in China with known sampling provinces, sampling years, and transmission risk groups available in the database were downloaded (data available as of December 1, 2019).

Phylogenetic analysis
Sequences were aligned using the BioEdit tool and the alignment was manually corrected according to the encoded reading frame. Duplicate sequences were discarded. If several sequences from the same patient were available in the database, only the oldest was retained. Long branch sequences were re-con rmed for their genotype, and those that were miscatalogued were eliminated from the study.
A maximum likelihood phylogenetic tree was reconstructed with the merged dataset using the GTR + CAT nucleotide substitution model in FastTree 2.1. 36 The HIV subtype was inferred by automated subtyping using context-based modeling for expeditious typing (COMET), 37 followed by phylogenetic analysis. Each sequence was assigned to one of eight subtypes, one of 102 circulating recombinant forms (CRF), or "unassigned." An "unassigned" sequence was deemed as a possible unique recombinant forms (URF). 6 The BHLN may also be used as a cohort to study natural disease progression of HIV in China. The starting point of the study was set as the onset of the infection and the outcome was laboratory acquired immunode ciency syndrome ([AIDS], de ned as a CD4 count < 200 cells/µL) at diagnosis (AIDSAD). The follow-up time was the duration between the starting point and the outcome.
As the seroconversion time for most of the participants was unavailable, the follow-up time was unmeasurable. To solve this problem, the follow-up time was treated as a matching variable in cohort analysis, as we hypothesized that the distribution of the follow-up time was well matched within the same transmission risk group and roughly matched the study population as a whole. Two sensitivity analysis were performed by limiting the comparison of subtypes in heterosexuals and MSM.

Statistical analysis
For geographic location, participants were grouped into 31 provinces according to the Hukou. Hukou is a basic household registration system in China; this system o cially identi es a person as a resident of an area and includes identifying information such as name, parents, spouse, and date of birth. These provinces were further divided into six regions according to their proximity and socio-economic status, in line with guidelines from the National Bureau of Statistics of China: north, northeast, east, central-south, southwest, and northwest.
Six sampling phases were established: 1994-2005, 2006-2008, 2009-2011, 2015-2017. The earliest (1994-2005 phase encompassed more years to account for the relatively fewer data available in these years. The prevalence of subtype by sex, age, ethnicity, transmission risk group, Hukou province, and region was calculated and the subtype distribution trends over the six sampling phases were examined. Categorical data were compared using the X2 test and continuous data were compared using one-way analysis of variance, wherever appropriate. Potential risk factors for acquiring AIDSAD were analyzed using logistic regression. Biologically plausible interactions were assessed in the multivariable model. Variables included sex, age (< 18, 18-24, 25-44, 45-64, and ≥ 65 years), ethnicity, region, subtype, transmission risk group, and sampling phase. In the model, a binary response was included, indicating the acquisition AIDSAD from each patient as an outcome. All variables were analyzed separately and the associated variables (P < 0.1) with their outcomes were entered into the multivariable model.
Results are expressed as odds ratios (OR) with 95% con dence intervals (CI) and two-sided P values, where P < 0.05 was considered statistically signi cant. All analyses were performed using R software (version 3.6.1; R Foundation, Vienna, Austria) and a listwise deletion was used to handle the missing data.

Ethical issues
All analyses were performed on de-identi ed datasets to protect participants' anonymity. The research ethics committee at the Beijing center for disease prevention and control approved this study, and all the methods in this study were performed in accordance with the approved guidelines. By law, consent was not required as these data were collected and analyzed in the course of routine public health surveillance.

Study Population
HIV pol sequences generated from 13,230 patient specimens submitted by the BHLN for TDR genotyping between 2001 and 2020 were analyzed. A total of 7,433 pol sequences sampled in China were retrieved, of which the province of origin, transmission risk group and sampling year were available from the Los Alamos HIV sequence database. In all, 20,663 aligned HIV pol sequences were used in this analysis, each representing a distinct HIV-positive individual (Fig. 1). These data were collected between 1994 and 2020 from 31 provinces of China.
Most participants were men (94.2%) of Han ethnicity (93.1%). The median age was 32 years (interquartile range [IQR] 26-42). Where available, the overall median baseline CD4 count was 338cells/µL (IQR 208-475). The transmission risk group was predominantly MSM (66.6%), followed by heterosexual (23.3%) and IDU (7.7%) ( Table 1).   Figure 2. illustrates the regional distributions of the seven common subtypes HIV strains. CRF07_BC was more prevalent in the southwest and northwest region. In the other four regions, the subtype with the highest prevalence was CRF01_AE. Of note, a signi cantly high prevalence of URF was detected in the southwest region (9.6%).
HIV subtype temporal trends

Phylogenetic analysis
Phylogenetic analysis revealed that the sequences from both sources were intermixed, suggesting that both sampling frames were drawn from the same overall population (Fig S1-3). Three, seven, and four distinct clusters were identi ed within subtype B, CRF01_AE, and CRF07_BC, including 14,578 individuals (81.6% of all patients infected with the three major subtypes). The clusters have been named based on a previous numbering system 27 and with the addition of new clusters in the current study. The cluster size ranged from 175-2,964 individuals. Most clusters were MSM dominated (10 of 14). Table S6 presents the detailed characteristics of these clusters.

CRF07_BC progressed slower than subtype B
To study the effect of subtypes on the disease progression, the analysis was limited to the individuals with available CD4 counts. CD4 counts of the seven common subtypes were compared overall and by sampling phase. The CD4 count of CRF07_BC was always signi cantly higher than that of subtype B and CRF01_AE (P < 0.01) (Fig. 3). Therefore, we empirically hypothesized that disease progression in these subtypes could be different. Table 3 reports the association between patient characteristics and AIDSAD.  bUnivariable logistic regression analysis.
In univariable logistic analyses, the risk of AIDSAD was signi cantly associated with sex, age, transmission risk group, and HIV subtype. After adjustment for these factors in the multivariable analysis, patients infected with CRF07_BC had only less than half of the risk of AIDSAD than those infected with subtype B (OR 0.46, 95% CI 0.39-0.53). Patients aged 45 years or older had a higher risk of AIDSAD than that of younger patients (< 18 years, [OR for individuals aged 45-64 years vs.<18 years : 3.36, 95% CI 1.47-8.87; OR for ≥ 65 vs. <18 years:4.32, 95% CI: 1.81-11.75]). The risk of AIDSAD was lower in MSM than that in heterosexual patients (OR: 0.67, 95% CI 0.6-0.75).
The Yi ethnicity was associated with a lower risk of AIDSAD (OR: 0.43, 95% CI 0.17-0.9); however, the sample size was very small. Two sensitivity analyses concerned only with heterosexuals and MSM were performed, and the outcomes were consistent with the whole population (data not shown).

Discussion
To our knowledge, this is the largest study to date reporting on the national distribution and trends of HIV subtypes in China, with a sample size of over 20,000, and spanning 1994-2020. 4 These data showed that the HIV epidemic in China exhibited some of the greatest global genetic diversity, consisting of 38 HIV subtypes. The only other country to match China is the United States, which has approximately 15 subtypes. 6,7 This high and sharp increase in HIV subtype diversity in China is consistent with evidence from most regions of the world. [3][4][5][6][7][8][9] Although there were variations in the prevalence of the three major subtypes, the combined prevalence of these subtypes was stable throughout the study period, suggesting they might be an indicator of equally stable HIV transmission in China.
The data revealed that the previously described subtype compartmentalization 4 no longer existed in the transmission risk group or was weaken in the geographic region, but persisted in the Uyghur and Yi ethnicity, which was as strong as it ever. Global travel and acquisition of infection abroad, population oating, and domestic transmission all likely contribute to increasing HIV viral diversity. [4][5][6][7][8][9] The comparison of disease progression between subtype B and other subtypes has been hindered by the fact that there are few populations with multiple circulating subtypes, including subtype B. [15][16][17][18][19][20][21][22][23][24][25][26] The epidemic in China characterized by CRF01_AE, CRF07_BC, and subtype B co-circulating provided a unique opportunity for such a direct comparison. The data revealed that CRF07_BC progressed slower than subtype B, while CRF01_AE progressed as fast as subtype B.
Consistent with the results of concerted action of seroconversion to AIDS and death in Europe (CASCADE) 16,17 , disease progression did not differ signi cantly by sex. The middle (45-64 years) and the older (≥ 65 years) age groups had the faster disease progression than the young (< 18 years). However, a lower disease progression was observed in MSM compare to that of heterosexuals.
We hypothesize that this difference could be attributed the shorter interval between seroconversion and diagnosis in MSM compare to that of heterosexuals, because most targeted HIV testing campaigns in China have always focused on the MSM population. 2 The CRF07_BC strains is a relatively young HIV strain, that originated in IDU in China and is mainly con ned to China. 28 During the past two decades, the number of individuals infected with CRF07_BC has undergone a signi cant increase in China, accounting for 38% of all infections in phase 2018-2020. Although it descends from the two most popular strains in the world (subtypes B and C), CRF07_BC displays many unique characteristics that differ from those of its parent strains. Li, et al. have also observed that individuals infected with CRF07_BC have a signi cantly higher baseline CD4 counts than those infected with CRF01_AE. 14 However, they did not realize that the higher CD4 counts could be regarded as a proxy of slower disease progression, nor did they generalize from their nding the conclusion that CRF07_BC progresses slower than CRF01_AE. We have previously shown that CRF07_BC has a lower TDR prevalence than subtype B and CRF01_AE (1.5% vs. 4.8% vs. 5.6%) respectively. 29,30 Ge et al. 31 and Cao et al. 32 have demonstrated that CRF07_BC is associated with better immune recovery in Chinese patients undergoing antiretroviral treatment (ART) compared to that of patients infected with CRF01_AE. Taken together, these results support the hypothesis that CRF07_BC is less pathogenic than subtype B.
Before 2014, people in China tended to accept the viewpoint that the Chinese HIV infectors will have approximately ten years AIDS-free time before they enter the AIDS phase, as report by the CASCADE study. 16 This study has several limitations. First, although it is the largest study of this kind, this study represents only approximately two percent of all individuals living with HIV in China. Thus, these ndings might not be fully representative. Second, viral loads (VL) information was not included in the study, which did not permit the evaluation of the association between VL and subtype. However, this is a goal of a future study. Third, the biological mechanisms hidden behind these observations were not elucidated. At this, Huang et al. 12 have shown that patients infected with CRF07_BC have signi cantly lower VL than that of patients infected with subtype B, which may be due to the deletion of seven amino acids that overlap with the apoptosis-linked gene 2-interacting protein (Alix) protein-binding domain of the p6 gag . Fourth, the infection time for most of the participants was unavailable, so the rate of CD4 count decline per year could not be assessed. Indeed, as China implemented the World Health Organization (WHO)'s 'treat-all', 'treat-early', and 'treatment as prevention' policy in 2016 34,35 , approximately 90% of individuals with HIV were treated with ART within the rst year after diagnosis, as to evaluate the natural disease progression was not only impractical, but unethical. This study provides a novel method to directly compare the rate of natural disease progression between subtypes, that is, the duration between the infection and the diagnosis as follow-up time, and to treat the follow-up time as a matching variable in multivariable logistic analysis.Fifth, the MSM population was most likely over-represented in the study sample. However, the original data, from which strati ed and weighted results may be easily calculated, has been provided.
In summary, these results highlight a China HIV epidemic characterized by a high prevalence of CRF01_AE, CRF07_BC, and subtype B infections, with an overall increasing subtype diversity over the past 26 years, providing a unique opportunity to directly compare disease progression among the three subtypes.
Disease progression was slower with CRF07_BC infection than with that of subtype B infection. Moreover, for the rst time, it was shown that CRF01_AE progressed as fast as subtype B. Future studies focusing on the effect of subtype on the outcome of ART, which include more confounding variables, such as VL, will help improve clinical practice and policymaking.
Declarations Figure 1 Study pro le.