From sequence data to patient result: a solution for HIV drug resistance genotyping with Exatype, end to end software for Pol-HIV-1 Sanger based Sequence analysis and patient HIV drug resistance result generation

doi:10.21203/rs.3.rs-21484/v1

Download PDF

Research Article

From sequence data to patient result: a solution for HIV drug resistance genotyping with Exatype, end to end software for Pol-HIV-1 Sanger based Sequence analysis and patient HIV drug resistance result generation

https://doi.org/10.21203/rs.3.rs-21484/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Introduction: With the rapid scale-up of antiretroviral therapy (ART) to treat HIV infection, there are ongoing concerns regarding probable emergence and transmission of HIV drug resistance (HIVDR) mutations. This scale-up has to lead to an increased need for routine HIVDR testing to inform the clinical decision on a regimen switch. Although the majority of wet laboratory processes are standardized, slow, labor-intensive data transfer and subjective manual sequence interpretation steps are still required to finalize and release patient results. We thus set out to validate the applicability of a software package to generate HIVDR patient results from raw sequence data independently.

Methods: We assessed the performance characteristics of Hyrax Bioscience's Exatype (a sequence data to patient result, fully automated sequence analysis software, which consolidates RECall, MEGA X and the Stanford HIV database) against the standard method (RECall and Stanford database). Exatype is a web-based HIV Drug resistance bioinformatic pipeline available at sanger.exatype.com. To validate the exatype, we used a test set of 135 remnant HIV viral load samples at the National HIV Reference Laboratory (NHRL).

Result: We analyzed, and successfully generated results of 126 sequences out of 135 specimens by both standard and Exatype software. Result production using Exatype required minimal hands-on time in comparison to the standard (6 computation-hours using the standard method versus 1.5 Exatype computation-hours). Concordance between the two systems was 99.8% for 311,227 bases compared. 99.7 % of the 0.2% discordant bases, were attributed to nucleotide mixtures as a result of the sequence editing in Recall. Both methods identified similar (99.1%) critical antiretroviral resistance-associated mutations resulting in a 99.2% concordance of resistance susceptibility interpretations. Base calling comparison between the two methods had Cohen's kappa (0.97 to 0.99) implying an almost perfect agreement with minimal base calling variation. On a predefined dataset, RECall editing displayed the highest probability to score mixtures accurately one vs. 0.71 and the lowest chance to inaccurately assign mixtures to pure nucleotides (0.002–0.0008). This advantage is attributable to the manual sequence editing in RECall.

Conclusion: The reduction in hands-on time needed is a benefit when using the Exatype HIV DR sequence analysis platform and result generation tool. There is a minimal difference in base calling between Exatype and standard methods. Although the discrepancy has minimal impact on drug resistance interpretation, allowance of sequence editing in Exatype as RECall can significantly improve its performance.

Bioinformatics

Drug Resistance Testing

Validation

Efficiency and Results

Human immunodeficiency virus (HIV) drug resistance testing (DRT) has been used by WHO to guide policies relating to antiretroviral treatment (ART) dispensation at an individualized level in clinical practice as well as the public health recommendations for antiretroviral therapy regimens in various populations^1,2. Drug resistance testing identifies mutations within the viral genome that confer resistance to the patient regimen, thus allowing healthcare workers to optimize patient treatment, increasing the chance of successful virologic suppression. Furthermore, implementing platforms for drug resistance surveillance at the population level^3,4 can help minimize the use of ineffective drugs, improving population-wide treatment outcomes, and reducing the risk of transmitted HIV drug resistance^5–7. The Applied Biosystems™ HIV-1 Genotyping Kit procedure is one of the methods widely used for HIV drug resistance testing.

The HIV DRT wet laboratory processes includes several steps such as viral RNA extraction using plasma or dry blood spot (DBS) sample type, reverse transcriptase-polymerase chain reaction PCR (RT-PCR) amplification, nested PCR, gel documentation, nested PCR product clean-up, cycle sequencing, cycle sequencing product clean up and finally population-based (bulk) sequencing^8,9. Several sequencing primers depending on the laboratory method, are required during the sequencing step to ensure complete bidirectional coverage over the entire length of the HIV-1 pol region of interest. In our laboratory, a laboratory specialist then assesses the quality of the sequences using an Applied Biosystems (ABI) sequence scanner before transferring ABI sequence trace files from the genetic analyzer to a disc or flash drive. DNA sequence reads from each specimen are then separately assembled into a contiguous consensus sequence in FASTA format by use of RECall analysis software (web or standalone). Sequence scanner and MEGA X is used to assess the quality of the FASTA file for contamination check using phylogenetic analysis method, and eventually transfer to the HIVDB Stanford database for mutation interpretation. These steps require considerable hands-on time as well as a highly trained technician. These steps can be challenging and time-consuming in a busy HIV DRT laboratory that is processing more than 300 samples per week with limited human resources.

Despite the number of HIV DR laboratories in resource-limited settings moving to RECall as standard software for contig assembly, resulting in the standardization of result reporting, resistance mutation reporting still varies in some cases between laboratories, even between identical samples¹⁰. Most of these inter-laboratory discrepancies come from differences in sample preparation procedures (e.g., extraction procedures, primer choice, quality assurance adherence, clean-up processes, or stochastic variation)^8,10–12. However, some are still as a result of the change introduced by technicians as they subjectively review the assembled sequences ^13,14. With the introduction of test and treat policies resulting in rapid ART initiation among those newly diagnosed with HIV¹⁵, most drug-resistant HIV variants are present at low frequencies in clinical isolates. Thus accurate identification of nucleotide "mixtures" (positions having two or more nucleotides) is required, especially for DR surveillance^16–18. Limited laboratory specialist capabilities and experiences in low-level nucleotide mixtures identification could thus result in clinically relevant drug resistance mutations being missed^19,20. Even though, standardization of laboratory Quality practices and protocols among the WHO-accredited laboratories has been instituted by external quality assurance program ¹⁸, the process does not capture HIV DR testing laboratories out of the WHO HIVResNet even though these laboratories do support the patient diagnosis. Also, despite these QA programs being in place, the impact of erroneous results due to subjective sequence interpretation on patient care is difficult to ascertain.

Implementation of automated sequence analysis and result reporting tool would enable objective and consistent interpretation and reporting of HIV genotype data and provide considerable practical advantages to clinicians, patients, and other healthcare workers utilizing the results for patient management. Most notably, it would improve result processing speed and significantly decrease human resource needs that would be apparent in the implementation of routine DR testing in resource-limited settings. The National HIV Reference laboratory in Kenya has thus validated a bioinformatics software tool, Exatype, that has the capabilities to address these challenges. Exatype consolidates the WHO-adopted processes for HIV DR genotyping into a single step - contig assembly, mutation calling, and drug-resistance interpretation are all automated. Specifically, Exatype includes the RECall software to interpret and analyze chromatograms and the Stanford HIVDB drug resistance algorithm for drug-resistance interpretation. Besides, it contains genetic distance analysis that allows for the detection of contamination. As an automated process, Exatype is to support HIV DR testing laboratories with a heavy workload. It combines the functionalities of RECall, Stanford HIV drug resistance database (HIVDB) and, MEGA X programs and is available at sanger.exatype.com.

In this paper, we present field validation results for automated Exatype analysis and reporting of HIV DR results.

Laboratory methods HIV drug resistance testing was performed at the National HIV Reference laboratory in Kenya. Using 1000 copies/ml program guidelines cut off for viral suppression as the test kit sensitivity limit, we picked remnant samples from the HIV RNA measurement section after HIV-1 viral load testing²¹. We performed HIV genotypic resistance testing on 135 remnant patient samples. We did plasma virus extraction using the ThermoFisher Kingfisher flex platform followed by one-step RT-PCR, denaturing of amplicons, and finally, a nested second-round PCR. For QA purposes, we assessed the PCR product on a gel. The cleanup procedure used Exosap before proceeding to cycle sequencing. Also, sequence product purification used x-terminator. An ABI 3730xl performed direct bi-directional sequencing encompassing HIV-1 protease (PR) and the first 296 codons of reverse transcriptase (RT). Sequencing Analysis v 5.2 (ABI) assisted in reading the chromatograms. For quality assurance, nucleotide mixtures (positions containing two or more nucleotides, with the minor peak height being ≥20% of the significant peak height, were marked with ABI 3730xl data collection software v 3.30.

Standard analysis procedure After the necessary sequence QA procedures using a sequence scanner, a laboratory specialist assembles the sequence trace files for each sample to generate a consensus sequence using standalone RECall software. This software assists the specialist by highlighting areas of conflicts as nucleotide positions with mixture and where overlapping sequence positions do not have the same base call (20% threshold). N is used to mark undistinguished regions of the sequence chain. The laboratory specialist then visually inspects each sequence, stopping at each conflict and making manual edits where necessary. This verification is to ensure that any variations are verified. The generated consensus sequence for each sample is then subjected to MEGA X for contamination check analysis and later to Stanford HIVDB to create patients' results. In addition to the 135 patient specimens, we included 40 EQA dry panels from the WHO ResNet Lab group to ensure that the study conforms to the Clinical and Laboratory Standard Institute CLSI guidelines on laboratory method validations. The choice of the EQA dry panel is because our method validation is only on sequence data analysis tools and not wet lab processes.

Besides the standard method, we used Exatype to reanalyze and generate results from 3730xl ABI trace files without sequence editing. Similarly, to RECall, overlapping peaks represents "mixed or ambiguous" bases. The location of the primary peak (called base) and the most significant secondary peak (uncalled base) in the trace file are determined by phred. It then aligns the peak positions to their corresponding locations in the. ab1 data as most primary and secondary peaks often offset. Poor sequence quality regions at the beginning and end of each fragment are then automatically identified and trimmed. All chromatograms (. ab1) were submitted to Exatype and processed without any human intervention, using a standard laptop (Asus-i3 660 3.33-GHz CPU, 3 GB RAM, Windows 2010).

Exatype nucleotide mixture calling and "marking" of potentially problematic bases The essential feature of Exatype is its consolidated workflow, where no file transfer between separate software programs is necessary. The contig assembly and FASTA file generation (by implementing RECall) and the subsequent interpretation by Stanford HIVDB is done automatically, without any editing or file transfer to MEGA X for contamination check or Stanford HIVDB for result generation. Following the assembly and alignment step, mixtures categorization is based on the quality and area under the curve of the called and uncalled base as determined by Phred using a built-in RECall software. Configuration for RECall, within Exatype, variables that guide its mixture calling for clinical drug resistance testing at the National HIV Reference Laboratory (NHRL) are listed in Table 2. The examination of each position in the sequence alignment sequentially and the samples that require manual editing are marked.

Exatype pass-fail criteria at the laboratory level We performed quality checks on every sample trace file to ensure that the sequence was acceptable. Table 2 and 3 lists the sequence rejection criteria. Once the trace file is uploaded to Exatype, and it passes the RECall specified internal quality control checks, it automatically generates sample results and corresponding FASTA files. At present, the software requires double primer coverage over the entire sequence length. We included only analyses that passed the Exatype-implemented quality control criteria in this study.

Subtyping and phylogenetic analysis

One hundred and twenty-six samples were successfully extracted, amplified for the RT and PR region in the nested PCR, and sequenced. Generated sequences covering codon 6-99 of the protease region and 1-251 of the RT region were then used for alignment using RECall and to construct the phylogenetic trees using the neighbor-joining method with PAUP. Alignment of generated sequences with Los Alamos database reference sequences revealed that 54 (43%),31 (25%),14 (11%), and 8 (6%) of the 126 specimens are subtype A, D, C, and G respectively. Simplot analysis revealed a few recombinant types in our study samples: 7 (6%) being AD, 2 (2%) AC, 2 (2%) AG and 8 (6%) CRF01_AE Table 1.

Data analyses: We compared the consensus sequences and results generated by the standard method and Exatype. Speed, concordance of base calls, and results were used to asses the performance of Exatype. Partial nucleotide discordance is when one methodology reported a nucleotide mixture, and the other reported one of the mixture's components (e.g., RECall reported Y and Exatype reported C). Complete nucleotide discordance is when the two-analysis method used, indicate different nucleotide at the same position for the same sample (e.g., RECall reported T and Exatype indicate C). Similarly, this can occur in a mixture when nucleotide called by one method is different from the other (e.g., RECall reported G and Exatype indicate Y).

We also compared the analysis of specific antiretroviral drug resistance mutation positions as defined by International AIDS Society (IAS table) on key resistance mutations. We processed 126 samples on Stanford HIV drug resistance genotyping Web service Sierra (algorithm version 8.8 [http://hivdb.stanford.edu/pages/algs/sierra_sequence.html]; Stanford University, Stanford, CA) to infer antiretroviral drug susceptibilities in RECall analyzed PR-RT nucleotide sequences. ANRS version 27, HIVDB version 8.9-1, and REGA version 8.0.2 reanalyzed the samples.

	Number of remnant clinical samples used
Subtype A	54(43%)
Subtype D	31(25%)
Subtype C	14(11%)
Subtype G	8(6%)
URF (AD, AC, AG)	11(9%)
AD	7(6%)
AC	2(2%)
AG	2(2%)
CRF01_AE	8(6%)

VL< 1000 cp/ml	34(27%)
VL>1000 cp/ml	92(73%)

Therapy naive	9(7%)
Therapy experience	103(82%)
Therapy unknown	14(11%)

Table 1: Distribution of the HIV-1 subtype in the samples used for validation

Parameter	Value	Interpretation
Quality censoring cutoff	<10	Phred quality scores cut off for excluding bases during assembly.

Mixture area (%)	>=20	The area of the uncalled peak must be at least 20% of the called peak area. If 50% of the reads pass this threshold, then a mixture is called.

Mark area (%)	>=15	The area of the uncalled peak must have at least 17.5% of the called peak area. If >=50% of the reads pass this threshold, then a mark is made.

Mark average quality cut-off phred score Additional marks	<20	If the average quality of the base across all reads is below the cutoff, then a mark is made. Insertions, deletions, and single primer coverage are also marked.

Table 2: Configuration variables for nucleotide mixture calling and base "marking" for clinical drug resistance genotyping^23,24

Failure category	Description
Stop codon	Any unambiguous stop codon (TGA, TAA, or TAG)
Bad inserts	An insertion relative to the reference sequence that is not a multiple of three bases, resulting in a frameshift
Bad inserts
Bad deletion	A deletion relative to the reference sequence that is not a multiple of three bases, resulting in a frameshift
Bad deletion
Too many mixtures	>3.5% of nucleotides sequences called as mixtures
N count	>=5 Ns (any base) in the sequence
Mark count	>=100 positions marked as being potentially problematic
Single coverage	>3 consecutive bases of single-read coverage with phred scores of 40
Low quality	Any section where the quality of all coverage is too low to make a call

Table 3: Criteria used by RECall for rejecting a sequence ^20,25

RECall was able to generate a consensus sequence for 98% (132/135) of the pol experiments, whereas Exatype was successful in 93.3% (126/135) of the tests Table 4. Of these, 126 (93.3%) met the default Exatype and RECall acceptability criteria after automated processing. Inadequate double primer coverage over the entire sequence length was the primary reason for failure as RECall has the flexibility of allowing single primer coverage. For the standard analysis using a standard Laptop (ASUS-i3 660 3.33-GHz CPU, 3 GB RAM, Windows XP), we performed RECall base calling, assembly, contamination check using MEGA X and alignment in less than 4 hours, with human sequence edit review. We then proceeded and used Stanford HIVDB to generate patient results in one hour.

In contrast, we did the entire analysis, QA contamination report generation, and patient result generation in Exatype on the same laptop within one hour. The longer time in the standard software pipeline is attributed from the sequence review and edits before exporting the contig into a different software MEGA X for QA analysis and Stanford HIVDB for patient result generation. All the steps are performed simultaneously in Exatype.

Editing method	Results	No results	Total
Exatype	126 (93%)	9 (7%)	135
Standard analysis Procedure	132 (98%)	3 (2%)	135

Table 4: Performance in generating consensus pol sequences for HIV-1 samples by the different editing approaches.

Nucleic acid sequence concordance between Exatype and Standard analysis procedure Within analyzed bases, there was 99.8% overall agreement in base calling between Exatype and the gold standard. There was 99.6% complete sequence concordance within 311,227 nucleotide positions, as indicated in Fig. 1. Of the 311 discordant nucleotides, 308 (99%) were "partially discordant" (mixtures called by one method but not the other), while 3 (1%) were wholly discordant. 76.5% (238 of 311) of the partially different bases comprised of nucleotide pairs as a result from transitions (R A/G, Y C/T) rather than transversions (K G/T, M A/C, S C/G, W A/T).

Distribution of discordant positions between the transitions, transversions, and a combination of both was relatively the same (n 11, 6, and 5, respectively), as indicated in Fig. 1. 1.2% of nucleotide mixtures detected on all bases. Overall, the standard method called a marginally more significant number of mixtures (1193 standard method -called mixtures [1.08%] and 1181 Exatype-called mixtures [1.05%]; P 0.6).

Amino acid sequence concordance between Gold standard and Exatype interpretations the 311 discordant nucleotide positions resulted in 284 discordant codons. 114 (40.1%) of these, produced Nonsynonymous substitutions between the standard and Exatype method at the sequence to amino acid translation level. 278 (97.8%) were partial amino acid discordances (sharing at least one amino acid between the two interpretations), while only 6 (2.2%) were complete amino acid differences.

In general, the gold standard and Exatype sequence review identified 97 "key" antiretroviral drug resistance mutations²⁶, as either complete amino acid substitutions or as part of mixtures. The two methods agreed for 123 cases. The Exatype identified one resistance mutation (E35D) that the gold standard did not, while the gold standard identified 2(K55R and R57K) that Exatype did not. This variation in resistant mutation identification affected two patient results though none of the three mutations has clinical significance

			# with					# with AA differences	# with difference in resistance interpretation
Region	#	# with NT differences	Difference Mix	Different NT^b	Gap manual^c	Error Manual^d	Error Exatype^e
PR	126	17	0	0	3	9	24	24	1(ANRS)
RT	126	17	4	1	1	8	31	54	1(ANRS); 2(REGA)

Table 5: Differences in Gold standard and Exatype editing of HIV-1 pol sequences from clinical samples and impact on drug resistance interpretation. We considered sequences that passed both Exatype and Gold standard editing. #, number of samples; NT, nucleotide; AA, amino acid; genotypic drug resistance interpretation systems: ANRS version 27, HIVDB version 8.9-1, and REGA version 8.0.2.

Number of samples with mixtures scored differently by the two approaches
Number of samples with pure nucleotides scored separately by the two approaches
Number of samples with parts of sequences that were not analyzable as judged by the editor
The number of samples containing differences between Recall and Exatype editing due to manual editing.
Number of samples containing differences between Exatype and Recall editing due to errors made during automatic editing in Exatype

From the HIV-1 RNA measurement remnant samples, 93.3% (126/135) of the pol HIV-1 had a consensus NT sequences available and generated by both Exatype and RECall Table 5. In total, 86.5% (109/126) of the PR, 74.6% (94/126) of the RT sequences were fully concordant at the NT level similar to the AA level. The differences in concordance between the different regions were attributed to the difference in coverage length and were less pronounced when normalized.

For each discordant NT call, the chromatograms were manually reviewed by a second laboratory specialist to verify whether the differences resulted from an erroneous call in the automatic or manual editing process. For both editing approaches, incorrect calls were observed, i.e., in 24 vs. nine samples for PR, 31 vs. 1 for RT Table 5. Only 1 RT nucleotide was different between the manually and automatically edited sequences. In both instances, differences result from mistakes made during manual editing. The operator trimmed the five ends of PR in 3 samples and one sample for RT, but these parts were still completely analyzed by RECall and not Exatype. Additionally, some of the erroneous calls in Exatype were because this tool does not allow sequence editing.

# with NT differences compared to
the reference sequence

# with AA
differences

# with differences in
resistance interpretation

Total

Missed mix

False mix

Different NT/mix

Total

Missed mix

False mix

Different NT/mix

ANRS

HIVDB

REGA

ANRS

HIVDB

REGA

Exatype

22/10

5/4

2/2

3/3

1/-

2/-

1/-

-/-

3/4

2/-

-/-

Recall

22/12

5/3

1/1

3/2

1/-

-/2

-/-

-/2

-/-

2/3

-/-

Table 6: Differences in RECall and Exatype editing of HIV-1 pol sequences from all EQA samples and impact on drug resistance interpretation. This analyses were confined to drug resistance positions (PR: 10, 20, 24, 30, 32, 33, 36, 46, 47, 48, 50, 53, 54, 63, 71, 73, 77, 82, 84, 88, 90; RT: 41, 62, 65, 67, 69ins, 69, 70, 74, 75, 77, 100, 103, 106, 108, 115, 116, 151, 181, 184, 188, 190, 210, 215, 219, 225). #, number of samples; PR, protease; RT, reverse transcriptase; NT, nucleotide; AA, amino acid; genotypic drug resistance interpretation systems: ANRS version 27, HIVDB version 8.9-1 and REGA version 8.0.2. The number of sequences that passed Exatype and RECall editing are before the slash. Number of sequences that did not pass either of the two approaches are behind the slash

The number of samples with mixtures present in the reference sequence, but not scored by the editing approach (pure wild-type or mutant NT).
The number of samples with mixtures scored by the editing approach that was not present according to the reference sequence (pure wild-type or mutant NT).
Number of samples with mixtures and pure nucleotides scored differently by the editing approach and the reference sequence

EQA results analysis: 85% (22 + 12 = 34)/40) of the EQA dry panels (These are FASTA files shared by the WHO to all the WHO accredited lab for competency assessment of staff in sequence editing) from WHO had a consensus sequence using Recall, while for the Exatype, it was 80% (32/40) Table 6. For each dry panel, a reference sequence sent by WHO was considered as the accurate results, and was calculated based on the consensus results of all participants within the WHO ResNet Lab (∼52 participants). We further reviewed each discordant NT call to find out whether the difference resulted from a missed mixture, a false mixture, or a different NT or mixture Table 6. Both Recall and Exatype are comparable in terms of detecting mixtures with both almost having a similar score on the mixtures that were not present in the reference sequence Table 6.

	RECall		Exatype
	PR	RT	PR	RT
# sequences without NT differences	28/34(82%)	32/34(94%)	24/32(75%)	30/32(94%)
# sequences without AA differences	30/34(88%)	34/34(100%)	26/32(81%)	30/32(94%)
# NT differences/total # NT	9/2112(0.43%)	1/2562(0.04%)	18/2023(0.88%)	4/2400(0.17%)
# AA differences/total # AA	5/724(0.72%)	0/912(0%)	10/675(1.48%)	2/800(0.25)

# M_e ∩ M_r	18	18	12	8
# M_r	21	19	16	11
P(M_e\|M_r )	0.83	1	0.7	0.85
# M_e ∩P_r	7	2	8	1
# Pr	2081	2679	1999	2381
P(M_e\|P_r )	0.002	0.0008	0.004	0.0004

Table 7: Comparison of RECall, Exatype editing of WHO dry sample EQA panel with the reference sequence at NT and AA level. To meet the CLSI guidelines of 40% reference panels being EQA standards, we included dry panels from the WHO ResNet group. #, number of; AA, amino acids; NT, nucleotides; M_e, mixtures present in the results of the editing approach; M_r, mixtures present in the reference sequences; M_e∩ M_r, mixtures present in the reference sequences that scored as a mixture by the editing approach; P_r, pure nucleotides present in the reference sequences; P(M_e|M_r ), the probability that a mixture scored if present in the reference sequence; M_e∩P_r, pure nucleotides in the reference sequences that scored as a mixture by the editing approach; P(M_e|P_r ), the probability that a mixture scored if no mixture was present in the reference sequence

At the NT level, the percentage of sequences without differences compared to the reference sequence is the slightly lower for Exatype editing, which is 75% and 94% for PR and RT, respectively vs. 82% and 94% for RECall editing Table 7. Using Recall, 0.43% of the PR and 0.04 % of the RT nucleotides were discordant with the reference sequence, in contrast to 0.88% of the PR and 0.17% of the RT nucleotides using Exatype which was markedly higher. The same tendency observed at the AA level Table 7. We then assessed for editing approach, the probability P(M_e|M_r) that a mixture scored if the mixture was present in the reference sequence and the probability P(M_e|P_r) that a mixture scored yet it was a pure nucleotide sequence.

In the remnant HIV-1 RNA samples, the majority of samples for which at least one of the editing approaches was able to generate a consensus NT sequence were interpreted as susceptible to most PI, NRTI, and NNRTI. Also, much more extensive drug resistance profiles observed in the WHO dry panel as compared to the clinical dataset Table 8.

Data set	According to	ANRS		HIVDB		REGA
		PI	RTI	PI	RTI	PI	RTI
Clinical	Exatype FASTA file	82/126 (65%)	51/126 (41%)	33/126 (26%)	42/126 (33%)	34/126 (27%)	41/126 (33%)
	RECall FASTA file	81/126 (64%)	52/126 (41%)	33/126 (26%)	42/126 (33%)	34/126 (27%)	43/126 (34%)
WHO Dry panel	Reference	18/40 (45%)	20/40 (50%)	18/40 (45%)	20/40 (50%)	17/40 (43%)	20/40 (50%)

Table 8: Number of samples displaying (intermediate) resistance to different drug classes, according to ANRS, HIVDB, REGA, and Geno2Pheno. For the HIV-1 RNA remnant dataset, we included only the sequences that passed for both RECall, and Exatype editing. In contrast, we included resistance information of all reference sequences for the WHO dry panel dataset. FPR, false-positive rate; RTI, reverse transcriptase inhibitor; PI, protease inhibitor; genotypic drug resistance interpretation systems: ANRS version 27, HIVDB version 8.9-1 for the clinical dataset and HIVDB version 8.9-1 for the EQA dataset, and REGA version 8.0.2, G2P Geno2Pheno

The study evaluated the performance characteristics of the Exatype sequence analysis and result generation tool developed by Hyrax Biosciences, the ability to accurately analyze and interpret ABI sequence data into patient results. Exatype is freely available for Applied Biosystems™ HIV-1 Genotyping Kit users at sanger.exatype.com. We compared the results (FASTA files and patient results) generated by Exatype against our laboratory gold standard method (RECall and Stanford HIVDB. Using a set of 135 sequences, we assessed the proportion successfully analyzed by both methods, as well as the concordance of detection of ambiguous nucleotides, amino acid changes, and drug resistance mutations between the sequences and results generated by the gold standard (Recall and Stanford) and Exatype. While the gold standard produced results for 132 samples, Exatype only produced results for 126. For the 126 (93%) for which results were available from both methods, there was a concordance of 99.8% between the two methods similar to other sequence analysis software comparisons^13,26. The minor differences were attributed to the partial nucleotide discordance, one method detected a mixture and the other detected one component of the mixture. This consequently resulted in partial discordance for amino acids too.

Exatype and the gold standard had a concordance of 99.1% on NRTI/NNRTI resistance mutations. This is similar to the inter-personnel skill variability on sequence editing^9,27, depending on the sample tested. The one key resistance mutation mixture that was not detected by the Gold standard and the two that were not detected by Exatype were as a result of partial mismatches due to differential detection of nucleotide mixtures. Despite the high concordance, the inflexibility of a fully automated system may be a drawback to the Exatype system, as the result of this study show in the two key mutations missed. Exatype as the gold standard mark unusual sequence positions, including mixtures, which someone can visually inspect. In our case, Exatype didn't have any human intervention.

The difference in the numbers of mixtures called between Exatype and the standard method was not statistically significant, making Exatype a vital data analysis standardization tool, especially in clinical reporting, which cannot be achieved by the gold standard method^28–30. Edits in Exatype are similarly traceable in a separate note pad within the batch system for all the results that are analyzed. This availability makes it compliant to Good Clinical Laboratory Practice (GCLP) standards call for traceability of data in the case of manual edits. Also, Exatype significantly improves the efficiency of HIV drug resistance genotyping and patient reporting. It removes the manual procedure of data transfer across different software and sequence editing that is currently in the gold standard.

The study indicates that the Exatype editing tool had the comparably underestimates the presence of mixtures as opposed to RECall. The discordances in Exatype within the pol sequences were limited to 0.17–1.48% at the NT and AA level, with limited impact on drug resistance interpretations. RECall editing performed slightly better than Exatype editing, as it displayed the highest probability to score mixtures accurately (0.83–1) vs (0.7–0.81). The lowest probability to inaccurately assign mixtures to pure nucleotides (0.002–0.0008). This low probability is attributable to the allowance of sequence editing with RECall. This study also highlighted the necessity of a second inspection as erroneous calls were not only made during automatic but also manual editing. In this respect, Exatype can be made better by allowing sequence editing before result generation.

RECall editing performed slightly better than Exatype editing, as it displayed the highest probability to score mixtures accurately (0.83–1) vs (0.7– 0.81) and the lowest probability to inaccurately assign mixtures to pure nucleotides (0.002–0.0008). This is attributed to the allowance of sequence editing with RECall and flexibility to accept single primer coverage. Our results show that Exatype can provide an objective, standardized protocol for HIV sequence analysis for routine patient drug resistance testing and research laboratories, though allowance should be given to allow for sequence editing before result generation for it to be comparable to Recall. The speed and removal of data transfer across different software when using the Exatype is the primary advantage as it removes the sequence edit and the MEGA X QA analysis steps. The system standardizes the laboratory data analysis procedures and thus facilitates unbiased sequence interpretation. We did not factor in the software cost especially for laboratories that are not using the Applied Biosystems™ HIV-1 Genotyping Kit and this might be a limitation of the scalability for users that might not be using the same.

The authors of this publication certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

Ethics approval and consent to participate

Amref Health Africa Research Ethics Committee approved the study (Ref No, 4562). We used the principles of the international Declaration of Helsinki 2013 and Good clinical laboratory practices to conduct the research. The study used a waiver of consent to conduct analyses on the remnant HIV viral load samples. Patients were receiving clinically significant results.
Consent for publication

Not applicable
Availability of data and material

All data contained within the article are publicly available?
Competing interests

None declared.
Funding

None
Authors' contributions

LK conceived the study, collected data, analyzed the data and drafted the manuscript; IM supervised data collection, contributed to data analysis and assisted in drafting and submission of the manuscript. CN, GK, KB, MNK, NB, VO, and DA participated in data collection and review of the manuscript, MK contributed to data analysis, drafting and critical revision of the manuscript. All authors approved the final version of the manuscript.

Acknowledgments

We acknowledge the Ministry of Health Kenya through the National Public Health Laboratory (NPHL) and the National AIDS and STI Control Program (NASCOP) for facilitating sample collection and allocating staff time to write this manuscript. We thank Peter Young, U.S. Centers for Disease Control and Prevention, Kenya, for reviewing a draft of this manuscript.

Baxter, J. D. et al. A randomized study of antiretroviral management based on plasma genotypic antiretroviral resistance testing in patients failing therapy. AIDS (2000) doi:10.1097/00002030-200006160-00001.
Meynard, J. L. et al. Phenotypic or genotypic resistance testing for choosing antiretroviral therapy after treatment failure: A randomized trial. AIDS (2002) doi:10.1097/00002030-200203290-00008.
DeGruttola, V. et al. The relation between baseline HIV drug resistance and response to antiretroviral therapy: Re-analysis of retrospective and prospective studies using a standardized data analysis plan. in Antiviral Therapy (2000). doi:10846592.
Vaerenbergh, K. Van. Study of the impact of HIV genotypic drug resistance testing on therapy efficacy. Verh K Acad Geneeskd Belg (2001).
Rhee, S. Y. et al. HIV-1 drug resistance mutations: Potential applications for point-of-care Genotypic resistance testing. PLoS One (2015) doi:10.1371/journal.pone.0145772.
Weinstein, M. C. et al. Use of genotypic resistance testing to guide HIV therapy: Clinical impact and cost-effectiveness. Ann. Intern. Med. (2001) doi:10.7326/0003-4819-134-6-200103200-00008.
Sendi, P. et al. Cost-effectiveness of genotypic antiretroviral resistance testing in HIV-infected patients with treatment failure. PLoS One (2007) doi:10.1371/journal.pone.0000173.
Chen, J. H. K. et al. Evaluation of an in-house genotyping resistance test for HIV-1 drug resistance interpretation and genotyping. J. Clin. Virol. (2007) doi:10.1016/j.jcv.2007.03.008.
Steegen, K. et al. Feasibility of detecting human immunodeficiency virus type 1 drug resistance in DNA extracted from whole blood or dried blood spots. J. Clin. Microbiol. (2007) doi:10.1128/JCM.00814-07.
Galli, R. A., Sattha, B., Wynhoven, B., O'Shaughnessy, M. V. & Harrigan, P. R. Sources and magnitude of intralaboratory variability in a sequence-based genotypic assay for human immunodeficiency virus type 1 drug resistance. J. Clin. Microbiol. (2003) doi:10.1128/JCM.41.7.2900-2907.2003.
Chroma, M. & Kolar, M. Genetic methods for detection of antibiotic resistance: Focus on extended-spectrum β-lactamases. Biomed. Pap. (2010) doi:10.5507/bp.2010.044.
Eshleman, S. H. et al. Performance of the Celera Diagnostics ViroSeq HIV-1 genotyping system for sequence-based analysis of diverse human immunodeficiency virus type 1 strains. J. Clin. Microbiol. (2004) doi:10.1128/JCM.42.6.2711-2717.2004.
Prosdocimi, F., Peixoto, F. C. & Ortega, J. M. {DNA} sequences bases calling by {PHRED}: error pattern analysis. Rev. Tecnol. da Informa{ç}{ã}o (2003).
Hillier, L., Ewing, B., Green, P. & Wendl, M. C. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. (1998) doi:PhD Thesis.
Brown, L. B. et al. High levels of retention in care with streamlined care and universal test and treat in East Africa. AIDS (2016) doi:10.1097/QAD.0000000000001250.
Power, R. A. et al. Genome-wide association study of HIV whole genome sequences validated using drug resistance. PLoS One (2016) doi:10.1371/journal.pone.0163746.
Brumme, C. J. & Poon, A. F. Y. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Research (2017) doi:10.1016/j.virusres.2016.12.008.
Lapointe, H. R. et al. HIV drug resistance testing by high-multiplex 'Wide' sequencing on the MiSeq instrument. Antimicrob. Agents Chemother. (2015) doi:10.1128/AAC.01490-15.
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. (1998) doi:10.1101/gr.8.3.186.
Ewing, B., Hillier, L. D., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. (1998) doi:10.1101/gr.8.3.175.
Praparattanapan, J. et al. Comparison of in-house HIV-1 genotypic drug resistant test with commercial HIV-1 genotypic test kit. Asian Biomed. (2011) doi:10.5372/1905-7415.0502.032.
Smith, T. F. & Waterman, M. S. Identification of common molecular subsequences. J. Mol. Biol. (1981) doi:10.1016/0022-2836(81)90087-5.
Mardis, E. R. New strategies and emerging technologies for massively parallel sequencing: Applications in medical research. Genome Medicine (2009) doi:10.1186/gm40.
Bertelli, C. & Greub, G. Rapid bacterial genome sequencing: Methods and applications in clinical microbiology. Clinical Microbiology and Infection (2013) doi:10.1111/1469-0691.12217.
Cockerill, F. R. Genetic methods for assessing antimicrobial resistance. Antimicrobial Agents and Chemotherapy (1999) doi:10.1128/aac.43.2.199.
Liu, T. F. & Shafer, R. W. Web Resources for HIV Type 1 Genotypic-Resistance Test Interpretation. Clin. Infect. Dis. (2006) doi:10.1086/503914.
Shafer, R. W. et al. High degree of interlaboratory reproducibility of human immunodeficiency virus type 1 protease and reverse transcriptase sequencing of plasma samples from heavily treated patients. J. Clin. Microbiol. (2001) doi:10.1128/JCM.39.4.1522-1529.2001.
Merigan, T. C. et al. High Degree of Interlaboratory Reproducibility of Human Immunodeficiency Virus Type 1 Protease and Reverse Transcriptase Sequencing of Plasma Samples from Heavily Treated Patients. J. Clin. Microbiol. (2002) doi:10.1128/jcm.39.4.1522-1529.2001.
R.W., S., A., W., M.A., W. & M.J., G. Reproducibility of human immunodeficiency virus type 1 (HIV-1) protease and reverse transcriptase sequencing of plasma samples from heavily treated HIV-1-infected individuals. J. Virol. Methods (2000) doi:10.1016/S0166-0934(00)00144-0.
Shafer, R. W., Warford, A., Winters, M. A. & Gonzales, M. J. Reproducibility of human immunodeficiency virus type 1 (HIV-1) protease and reverse transcriptase sequencing of plasma samples from heavily treated HIV-1-infected individuals. J. Virol. Methods (2000) doi:10.1016/S0166-0934(00)00144-0.

Download PDF

Version 1

posted

You are reading this latest preprint version

From sequence data to patient result: a solution for HIV drug resistance genotyping with Exatype, end to end software for Pol-HIV-1 Sanger based Sequence analysis and patient HIV drug resistance result generation

Status:

Version 1

Abstract

Figures

Introduction

Materials and Methods

Subtyping and phylogenetic analysis

Results

Discussion

Conclusion

Declarations

References

Status:

Version 1