HIV-1 subtyping.
HIV-1 subtyping was done using HIV-1 subtyping online-automated tools and all sequences were verified as HIV-1 subtype CRF02_AG.
Database derived IN sequence resistance analyses.
After excluding multiple sequences from a patient to avoid overestimation of the variant calling and problematic sequences, we used 287 sequences collected between 1994 and 2010. These sequences were subsequently screened for the presence of RAMs. We identified 12.8% (37/287) sequences to contain RAMs, with only 1.0% (3/287) having major INSTI RAMs: T66A, Q148H, R263K and N155H. Two mutations, Q148H and R263K, occurred together in one sequence (0.3%), whereas T66A and N155H were present individually in one sequence each. 11.8% (34/287) of the sequences contained five different IN accessory mutations, namely Q95K, T97A, G149A, E157Q and D232N. Mutations G149A and D232H occurred together in one sequence (0.3%). Notably, one sequence dating back from 2010 had two major mutations; Q148H and R263K in combination with two other minor mutations G149 and D232H.
Generation of consensus Cameroonian’s HIV-1 CRF02_AG sequence
The consensus sequences generated using the database-derived HIV-1 CRF02_AG sequences (n = 287) and cohort sequences (n = 20), identified 20 naturally occurring polymorphisms (NOPS): E11D ,K14R, V31I, M50I, I72V, L74MVI, L101I, T112V, T124A, T124A, G134N, I135V, K136K/Q, V201I, T206S, T218I, L234I, A265V, R269K, S283G (Fig. 1). Three of these (E11D, K14R and V31I) belong to the NTD, whereas M50I belongs to the loop region connecting the NTD and CTD. Eleven NOPs (I72V, L74MVI, L101I, T112V, T124A, T124A, G134N, I135V, K136K/Q, V201I and T206S) are part of the CCD, and the remaining five (T218I, L234I, A265V, R269K and S283G) belong to the CTD.
Molecular modelling and structural analysis
Figure 2, shows the 3D tetrameric structure for HIV-1 CRF02_AG IN that consist of 288 amino acids, 10 alpha helices, 9 beta sheets and 19 coil regions. The homology model passed all the external 3D quality validation tools subjected to it. The Verify3D score for the model was predicted to be 71.1%, while ERRAT score for all the chains was 86.0% and higher, the Ramachandran plot indicated that 98.0% of residues occur in most favoured and allowed regions, and the Prosa Z-score was − 6.18 which is in range with proteins of similar size. Superimposing the template 5u1c onto the target energy minimized structure indicated an RMSD value of 0.212 Å, suggesting very little backbone deviation in main chain atoms (Fig. 2b). Figure 2c, shows the locations of the nine mutations relative to the active site. Furthermore, mCSM predictions indicated that eight of the nine variants, i.e. the M50I, L74I, L74M, T97A, G118S, S119R, P145S, E157Q substitutions resulted in destabilizing effects of -0.582, -1.069, -0.93, -1.051, -0.492, -0.091, -0.485 and − 1.111 Kcal/Mol each, respectively. Only substitution Q95K resulted in a slightly stabilizing effect of 0.146Kcal/Mol. Interaction analysis of the single amino acid changes indicated differences in the number and type of interaction between neighbouring residues and the DNA. The T97A showed four polar contacts for T97 compared to the three of A97 (Table 1). This suggests a loss of stable contacts in this region that could destabilize the protein structure. Moreover, the S119R substitution indicated interactions with the known active site residue E92 that could alter the IN active site reducing INI binding (Table 1). Inspection of the E157 residue showed four contacts with neighbouring residues while Q157 revealed five polar contacts of which two were with DNA (Table 1). In addition, the remaining other six substitutions; M50I, L74I, L74M, Q95K, G118S and P145S showed no changes in the number or type of interactions, implying no strong effect on the protein structure and function (Table 1).
Table 1
Summary of all interactions observed between the INSTIs and CRF02_AG IN subtype. The number in front of brackets is the total amount of interactions. Abbreviations of amino acids: A -Alanine; D-Aspartic acid; E-Glutamic acid; G-Glycine; H-Histidine; I-Isoleucine; K-Lysine; N-Asparagine; Q-Glutamine; R-Arginine; S-Serine; T-Threonine; Y-Tyrosine, Bold-Change in amino acid
Number
|
Mutation
|
# Polar contacts
|
WT
|
Mutant
|
1
|
M50I
|
None
|
None
|
2
|
L74M
|
1 (Glu 87)
|
1 (Glu 87)
|
3
|
L74I
|
1 (Glu 87)
|
1 (Glu 87)
|
4
|
Q95K
|
2 (1(Ala98), 1(Tyr99))
|
2 (1(Ala98), 1(Tyr99))
|
5
|
T97A
|
4 (2(T93), 1(G94), 1(I101))
|
3 (1(T93), 1(G94), 1(I101))
|
6
|
G118S
|
None
|
None
|
7
|
S119R
|
3 (1(Thy29), 1(Asn120), 1(Thr122)
|
3( 1(Thy29), 1(Glu92), 1(Thr122))
|
8
|
P145S
|
1 (Gln148)
|
1 (Gln148)
|
9
|
E157Q
|
4 (1(Ser153), 1(Met154), 1(Lys156), 1 (Ile161))
|
5 (1(Thy20), 1(Ade21), 1(Ser153), 1 (Met154), 1(Ile161))
|