Transmitted drug resistance mutations and natural polymorphisms of CRF01_AE before treatment
In this study, 40 out of 2034 (1.97%) treatment-naïve CRF01_AE-infected patients had transmitted DRMs, with the common DRMs comprising K103N, G190S, K101E, T215S, K65R, and K219Q. In addition to above DRMs, natural polymorphisms of amino acids with a prevalence >1% were detected at 53 (53/240, 22.1%) sites in RT, of which nine sites (40, 68, 69, 98, 103, 118, 179, 210, and 238) were known drug resistance-associated sites. Moreover, 31 sites (4, 5, 6, 8, 11, 28, 32, 35, 36, 39, 40, 43, 88, 103, 104, 105, 111, 118, 123, 135, 172, 173, 174, 177, 179, 200, 203, 207, 211, 214, and 238) in CRF01_AE had higher mutation rates than subtype B HIV-1 strains in the Stanford HIV Drug Resistance Database (|Z value|≥3) (Fig.1). These 31 sites were defined as CRF01_AE-specific polymorphism sites, which included five known drug resistance-associated sites, site 238 (73.8%), site 118 (26.1%), site 179 (21.2%), site 103 (8.1%) and site 40 (3.1%), as well as 26 other sites that were not known to be associated with drug resistance (Fig.1).
According to the phylogenetic analysis, the 2034 sequences mainly belonged to two CRF01_AE lineages, including 416 (20.5%) sequences of lineage 4 and 1522 (74.8%) sequences of lineage 5 (Additional file 4: Figure S2). Fifty-one and forty-four natural polymorphism sites in lineages 4 and 5 were detected, respectively, with differences in 35 sites between the two lineages (|Z value|≥3). Both lineages had 26 polymorphism sites with higher mutation rates than in subtype B HIV-1globally (|Z value|≥3), including two known drug resistance-associated sites (sites 179 and 238) (Fig.1).
Natural polymorphisms of CRF01_AE had little impact on treatment outcomes
A total of 1330 out of 2034 CRF01_AE-infected patients received first-line ART, among which 105 (7.9%) patients experienced TF. We found 13 sites with differences between TF and TS patients (1148, 86.3%), comprising the polymorphisms at sites 75 and 189, which were only found in TF patients, and the polymorphisms at sites 4, 5, 8, 21, 32, 49, 105, 165, 169, 171, and 204, which were only found in the TS patients. The mutation rate of site 75 in TF patients was significantly higher than in TS patients (|Z value|≥3) (Fig.2).
Common DRMs and potential new DRMs developed in CRF01_AE-infected patients with TDF/3TC/EFV TF
Forty-two CRF01_AE-infected patients with TDF/3TC/EFV TF were selected according to the flow chart presented in Figure S1 to determine the acquired DRM profile of CRF01_AE. The time between baseline and TF sampling time point among the 42 TF patients was 184 days (interquartile range: 177.0–236.5). The number of DRMs at TF time point were significantly increased compared to baseline (Z=-5.604, p<0.001). The sequences of the baseline and TF time point from each patient of the 42 TF patients clustered with bootstrap value higher than 85 in the phylogenetic tree (Additional file 5: Figure S3). The mutation rates of 14 sites increased significantly at TF time point, with increase ranging from 9.5% to 66.7% (Table 1). Of these 14 sites, 13 were known drug resistance-associated sites, including seven NRTI-associated sites and six NNRTI-associated sites. The NRTI-associated DRMs detected at TF time point in descending order included K65R (57.1%), M184V/I (47.6%), S68G (26.2%), A62V (14.3%), K70E/R (9.5%), and Y115F (9.5%). The NNRTI-associated DRMs detected at TF time point included G190S/C (66.7%), K101E/N/Q (52.4%), V179D/I/A/T/E (45.2%), Y181C (42.9%), K103R/N/S (42.9%), and V106M (23.8%) (Table 1). It was noted that an unknown mutation (V75L) was detected at site 75, a drug resistance-associated site, which increased from 4.8% at baseline to 16.7% at TF time point (Z value=2.494, p<0.05; p McNemar test=0.008). Moreover, a new mutation (L228R) was detected at site 228, a non-DRM site in the Stanford HIVdb algorithm, which increased from 0% at baseline to 11.9% at TF time point (Z value=2.306, p<0.05; p McNemar test=0.063). We speculated that both V75L and L228R might be potential new DRMs in CRF01_AE.
Relationships of potential new DRMs with known DRMs
To explore the role of potential new DRMs, the mutations at 14 sites with significantly increased mutation rates at TF were used for co-variation analyses. Nine known DRMs (K65R, V106M, Y115F, V179T/E/D, Y181C, M184V, and G190S) and two potential new DRMs (V75L and L228R) were demonstrated to be under positive selection pressure (Ka/Ks >1, LOD >2). Twenty-eight links were detected among these mutations (cKa/Ks >1, LOD >2) (Table 2). Among them, the known DRMs Y181C and G190S showed the strongest correlation (cKa/KsY181C-G190S=22.86, LOD=infinity). V75L was correlated with known DRMs G190S (cKa/KsV75L-G190S=3.24, LOD=infinity), K65R (cKa/KsK65R-V75L=2.00, LOD=5.04), and M184V (cKa/KsV75L-M184V=1.25, LOD=4.03). L228R was correlated with known DRMs G190S (cKa/KsL228R-G190S=2.25, LOD=infinity) and K65R (cKa/KsK65R-L228R=2.00, LOD=3.46), and strongly correlated with Y181C (cKa/KsY181C-L228R=6.00, LOD=4.09) (Table 2).
L228R occurred simultaneously or followed the appearance of Y181C
To further explore the temporal association and the evolutionary dynamics between Y181C and L228R, longitudinal plasma samples of four CRF01_AE-infected patients with Y181C and L228R mutations were studied using deep sequencing. The first case demonstrated a time lag between the Y181C and L228R mutations; Y181C occurred in 53.4% of the sequences at 1-month post treatment, which increased to 100% at 3 months post treatment, and L228R did not appear until 6 months post treatment, when 87.1% of sequences carried both Y181C and L228R mutations. The second and third cases had Y181C and L228R only at TF. For the second case, 100% of sequences carried both Y181C and L228R simultaneously while, for the third case, 80% of sequences carried both Y181C and L228R simultaneously, and the remaining 20% carried only Y181C (Fig.3). The fourth case could not be analyzed due to sequencing failure.