Gene Coverage Count and Classification (GC3): a locus sequence coverage assessment tool using short-read whole genome sequencing data, and its application to identify and classify histidine-rich protein 2 and 3 deletions in Plasmodium falciparum

DOI: https://doi.org/10.21203/rs.3.rs-1862557/v1

Abstract

Background

The ability of malaria rapid diagnostic tests (RDTs) to effectively detect active infections is being compromised by the presence of malaria strains with genomic deletions at the hrp2 and hrp3 loci, encoding the antigens most commonly targeted in diagnostics for Plasmodium falciparum (Pf) detection. The frequency of such deletions can be determined in publically available Pf whole genome sequencing (WGS) datasets. A computational approach was developed and validated, termed Gene Coverage Count and Classification (GC3), to analyze genome-wide sequence coverage data and provide informative outputs to assess presence and coverage profile of a target locus in WGS data. GC3 was applied to detect deletions at hrp2 and hrp3 (hrp2/3) and flanking genes in different geographic regions and across time points.

Methods

GC3 uses Python and R scripts to extract locus read coverage metrics from mapped WGS data according to user-defined parameters and generates relevant tables and figures. GC3 was tested using WGS data for laboratory reference strains with known hrp2/3 genotype, and its results compared to those of a hrp2/3-specific qPCR assay. Samples with at least 25% of coding region positions with zero coverage were classified as having a deletion. Publicly available sequence data was analyzed and compared with published deletion frequency estimates.

Results

GC3 results matched the expected coverage of known laboratory reference strains. Agreement between GC3 and a hrp2/3-specific qPCR assay reported 19/19 (100%) hrp2 deletions and 18/19 (94.7%) hrp3 deletions. Among Cambodian (n = 127) and Brazilian (n = 20) WGS datasets, which had not been previously analyzed for hrp2/3 deletions, GC3 identified hrp2 deletions in three and four samples, and hrp3 deletions in 10 and 15 samples, respectively. Plots of hrp2/3 coding regions, grouped by year of sample collection, showed a decrease in median standardized coverage among Malawian samples (n = 150) suggesting an increase in frequency of deletions between 2007-08 and 2014-15. Among Malian (n = 90) samples, median standardized coverage was lower in 2002 than 2010, indicating widespread deletions present at the gene locus in 2002.

Conclusions

The GC3 tool accurately classifies hrp2/3 deletions and provided informative tables and figures to analyze targeted gene coverage. GC3 is an appropriate tool when performing preliminary and exploratory assessment of locus coverage data.

Background

From 2010 to 2020, national malaria control programs (NMCPs) distributed 2.2 billion rapid diagnostic tests (RDTs) for malaria and 3.1 billion RDTs were sold by manufacturers, the majority of these going to malaria-endemic countries in sub-Saharan Africa [1]. RDTs are an integral part of nearly all NMCP’s clinical and field interventions since they provide quick and effective malaria diagnosis. These RDTs include a small cassette detecting Plasmodium-specific antigens in the blood of an infected individual and are user-friendly and affordable [2]. Predominantly, RDTs detect the Plasmodium falciparum-specific antigen histidine-rich protein 2 (HRP2), which is released into the bloodstream in large quantities when infected red blood cells lyse [3]. P. falciparum accounts for vast majority of the 241 million reported human malaria cases in 2020 and is the primary parasite causing malaria-related mortality and morbidity [1]. Due to homology and considerable sequence similarity between the two proteins, HRP2-based RDTs can cross-react with histidine-rich protein 3 (HRP3) in high-density infections [4]. As evidence of their effectiveness, 94% of WHO-qualified RDTs are either HRP2-based or based on a combination of HRP2 and a partner antigen such as parasite lactate dehydrogenase or aldolase [57]. HRP2-based RDTs are an essential diagnostic tool for NMCPs to scale surveillance operations and adequately assess infection, leading to proper treatment administration, measure intervention progress and identify malaria reservoirs.

Recently, however, the effectiveness of HRP2-based RDTs is becoming compromised due to the emergence of deletions in the hrp2 and hrp3 (hrp2/3) loci that prevent the expression of a functional protein [5, 715]. In particular, full deletions, as well as some partial deletions, in one or both of these genes eliminate detectable HRP2 and/or HRP3, preventing accurate malaria diagnosis using RDTs. Previous estimates of hrp2/3 deletion prevalence report higher frequencies in South and Central America, followed by Africa, then Asia and Oceania [16]. Low-transmission areas with high treatment rates, characteristics often found in elimination settings, are especially at risk for the spread of strains with hrp2/3 gene deletions, as models show that, under those conditions, strains with hrp2/3 deletions have a strong fitness advantage over those with intact genes [17]. Therefore, as NMCPs continue to control and move toward elimination, it is critical to monitor the presence and spread of hrp2/3 deletions. Without fully understanding the dynamics of hrp2/3 deletions, and spread of those deletions in particular, undiagnosed infections may lead to an increase in malaria prevalence and mortality, and hinder global progress towards control and elimination.

A computational tool that facilitates detection and classification of deletions in hrp2/3 (e.g. partial vs. complete deletions) in published whole genome sequencing (WGS) datasets will enable rapid and detailed analysis and estimation of deletion prevalence within and between datasets. The development of baseline values as well as the comparison of deletion prevalences across current samples sets as well as temporal comparison between these and previously published datasets may be particularly useful. Previous studies have performed analyses using WGS data [1820]; however, implementation of the methods used in these studies requires a strong understanding of bioinformatics tools and packages. The development of a more user-friendly computational tool would expand the ability to implement surveillance of locus deletion based on WGS coverage data to a wider audience investigating and monitoring hrp2/3 deletions, as well as deletion genotypes in other target genes. This work aimed to fill this gap, by developing a computational tool, which we call “Gene Coverage Count and Classification”, or GC3, to provide translatable results of hrp2/3 deletion prevalence and classification, based on short-read WGS data, among global Pf samples for which WGS data is available.

Methods

Samples

The WGS data used were generated either by direct sequencing of total DNA extracted from each isolate or by sequencing post selective whole genome amplification (sWGA) of extracted DNA, and were reported previously [21]. Some of the WGS datasets were generated as part of the MalariaGEN project [22] and downloaded from the Sequence Read Archive (SRA). A selection of field samples representing 19 different countries from Africa (n = 9), South America (n = 5), Asia (n = 4) and Oceania (n = 1), for a total of 1120 datasets (1114 global samples + 6 reference strains), were evaluated for general results (Supplementary Figure S1). The following laboratory reference strains with known hrp2 and hpr3 genotype were used for developing and testing GC3: NF54 (West Africa) – hrp2 and hrp3 present, 7G8 (Brazil) – hrp2 and hrp3 present, NF135.C10 (Cambodia) – hrp2 and hrp3 present, NF166 (Guinea) – hrp2 and hrp3 present, Dd2 (Laos) – hrp2 absent/hrp3 present and HB3 (Honduras) – hrp2 present/hrp3 absent.

Read coverage files were generated by aligning raw reads in fastq format to the Pf3D7 reference genome assembly (PlasmoDB release v24) using bowtie (v2.2.9 and above). Alignment files in BAM (Binary sequence Alignment/Map) format were processed according to GATK’s (Genome Analysis Toolkit) Best Practices documentation. Genome-wide coverage per site was recovered using bedtools’ genomecov function [21, 23]. The resulting BED (Browser Extensible Data) file (a tab-delimited text file) is used as the initial input to GC3. However, any delimited file with columns for molecule identifier (e.g. Pf3D7_08_v3), chromosomal position and coverage value is acceptable. When comparing sample datasets, coverage values per base pair (bp) were standardized by dividing ‘locus coverage’ by ‘subtelomeric mean coverage’ to account for differences in sequencing depth among samples.

Computational tool framework and algorithm

For GC3 to function properly, Python v3.0 and R v4.1.1 (or later versions) with the following libraries must be installed: readxl, writexl, dplyr, reshape2, and ggplot2. GC3 uses a Python-based script to extract read coverage information for genomic coordinates set by the user and processes these output files using an R script (Fig. 1). Following the framework, the user is required to provide input parameters at two junctions.

1. Extracting target coverage data – Python-based script

Within the Python script, the user is required to provide parameter inputs depending on the desired output. To use the “sliding window” option, the user must provide: 1) name of input file, 2) start coordinate, 3) end coordinate, 4) molecule identifier containing target locus (e.g. Pf3D7_08_v3), 5) interval size (i.e. window length, in base pairs), and (6) step size (i.e. shift between windows, in base pairs). In the initial window, defined by start coordinate and interval size, the average coverage is obtained by adding read coverage across all positions and dividing by interval size. The start position is then updated by adding step size to the previous start coordinate and the process is repeated until the end coordinate is reached. The output file will report an interval’s start and end coordinates separated by a colon and flanked by apostrophes and the interval’s average read coverage separated from the coordinates by a colon (e.g. '1290240:1290740': 294.228).

If individual coverage of all positions in the interval of interest is desired, then the interval size should be set to 1, and GC3 will extract values for each coordinate between start and end coordinates, inclusively (step size is automatically set to 1). Output file will report the position and respective coverage (e.g. '1372236': 387). The intermediate output is a text file with position(s) and corresponding read coverage values.

Additionally, the user can calculate mean coverage between start and end coordinates using a separate GC3 function. User parameters needed are (1) name of input file, (2) start coordinate, (3) end coordinate, and (4) molecule identifier. This function is needed if the user desires to know, for example, the mean coverage over a wider region or to standardize coverage between different sets of samples (i.e., sample subgroups).

2. Coverage data processing – R-based script

The user will need to input intermediate output files into the separate GC3’s R script to clean, and generate sample metrics and descriptive plots. At the start of the R script, the user will define (1) path to the intermediate files, (2) name of the intermediate file(s), (3) target locus’ coordinates in reference genome, (4) gene’s intron coordinates (if necessary), in reference genome, (5) position coordinates of interest (e.g. flanking gene positions), (6) list of subgroup sample identifiers and subgroup name (optional). If read coverage is to be standardized relative to coverage in a reference chromosome or chromosomal segment, then a file of mean read coverage per chromosome or segment (obtained as described above) per sample should also be defined. The GC3 R script will output several files, namely, (i) Excel version of intermediate text files, (ii) summary metrics: sample identifier, overall mean coverage -- if mean coverage file included, mean target gene coverage, proportion of gene positions with coverage, proportion of smaller regions of interest (e.g. coding regions, exons, etc.), deletion classification, and count of positions with 0X coverage, (iii) read coverage information for target gene (number of positions with zero coverage, and mean and median coverage per position over all samples), and (iv) descriptive plots: sliding window coverage over region of interest (i.e. subtelomeric region), all coordinates coverage over target gene positions (i.e. hrp2 and hrp3), and proportion of positions with zero coverage.

Python and R scripts can be found at the Silva group’s GitHub (https://github.com/igs-jcsilva-lab) as well as a README file with detailed instructions and input examples.

Detection of hrp2, hrp3 and flanking deletions

The gene structure of hrp2 and hrp3 in the reference 3D7 strain was obtained from PlasmoDB (www.plasmodb.org). Both hrp2 and hrp3 consist of two coding exons. Exon1 is 69 bp in length for both genes, and exon2 is 848 bp long in hrp2 and 758 bp in hrp3 (Fig. 2). The analysis of WGS data focused on subtelomeric regions of chromosome 8 (Pf 3D7 reference strain coordinates 1,290,240–1,443,449, for a total of 153,209 bp), containing the hrp2 coding DNA sequence (CDS) and intervening intron, and of chromosome 13 (Pf 3D7 reference strain coordinates 2,731,041–2,892,340, for 161,299 bp), containing the hrp3 CDS and intron. Subtelomeric coordinates were chosen to include the closest “essential” gene [24] downstream of hrp2 or hrp3 and farthest upstream functional gene (i.e. PfEMP1-encoding var gene).

Metrics were generated to classify samples by presence or absence of full or partial deletions in each locus of interest. If ≤25% of the CDS was missing (i.e. at most 25% of the reference CDS had zero coverage) the locus was considered present with a “small deletion of uncertain functional impact” (SDUFI). If > 25% (but not 100%) of the reference CDS positions had zero coverage the sample was classified as having a partial deletion (25% < %-positions-with-zero-coverage < 100%); it was classified as having a complete deletion if all positions have zero coverage. This classification is partly informed by Sepúlveda and colleagues [18], who implemented an algorithm to perform deletion calling without having to analyze the coverage profile of the entire genome. They classified deletions based on a 75% threshold of positions with ≤ 2X coverage, but we considered this to be too stringent and decreased our thresholds as explained above to account for “SDUFIs” or partial deletions less than the 75% threshold that might impact protein detection. Deletions of flanking genes were assigned to samples if > 25% of the flanking gene’s positions reported zero coverage. Intergenic regions were excluded to reduce the effect of variable read coverages in non-coding regions.

GC3 agreement with qPCR assay

A previously described hrp2/3-specific qPCR assay capable of detecting locus deletions in mono- and poly-clonal infections [25] was utilized to compare with the hrp2/3 deletion genotype inferred by GC3. In summary, primer sequences were adapted from conventional PCR [26] to bind to conserved regions of hrp2, hrp3 and an apicomplexan-specific single copy gene used as positive control, rnr2e2 (ribonucleotide reductase R2_e2, [27]). The computational approach used by GC3 for the detection of hrp2/3 deletions was compared to this hrp2/3-specific qPCR assay [25], using the following samples:

  • NF54 – Positive control

  • 7G8 – Positive control

  • Dd2 – hrp2 absent control

  • HB3 – hrp3 absent control

  • 17 global samples (see Supplementary Table S1 for details)

The presence and classification of hrp2/3 deletions is reported for the four laboratory reference strains mentioned above and for 17 global samples from Brazil (n = 3), Cambodia (n = 6), Mali (n = 3), Malawi (n = 4) and Thailand (n = 1). Global samples with accessible DNA material were randomly selected to represent the following GC3-inferred genotype subgroups: samples with no deletions, hrp2 deletion (complete), hrp3 deletion (complete), double hrp2/3 deletion, low overall sample mean read coverage (< 20X), possible discordant pairs (partial deletion, with non-zero coverage in qPCR primer binding sites), and PCR primer site deletions (samples with zero coverage in qPCR primer binding site – either in hrp2 or hrp3). Accession ID and subgroup stratification of global samples can be found on Supplementary Table S1.

Statistical analyses

When measuring correlation between mean coverage in hrp2/3 positions and subtelomeric or upstream/downstream gene, Spearman’s rank correlation method was used (Supplementary materials). Spearman’s method accounts for non-parametric distribution and therefore mean coverages were not standardized [28]. R v4.1.1 program was used to conduct statistical analysis.

Results

Sample read coverages by sliding windows of 1000 bp intervals and 500 bp step size were generated over the subtelomeric regions of chromosome 8 and chromosome 13 (sum of coverage across all positions in interval / interval length). Additionally, coverage at every position (interval = 1) was generated at every position between coordinates 1,372,236 to 1,377,299 on chromosome 8 and 2,835,756 to 2,847,557 on chromosome 13. These positions corresponded to hrp2 and hrp3 coordinates plus 2000 bp on either end of their respective coding regions.

Demonstrating Gc Features Using Laboratory Strains Of Known Genotype

WGS data from reference laboratory strains were analyzed to estimate hrp2/3 coverage per bp and the proportion of positions with coverage by at least one read (≥ 1X coverage) at hrp2 and hrp3 coordinates, and ultimately evaluate the validity of results from GC3. Expected coverage was estimated using each respective subtelomeric region as reference. Overall, for each lab strain reference, we found excellent concordance between coverage values in each locus and the respective subtelomeric chromosomal regions (Table 1). Mean subtelomeric coverage of the Dd2 strain (with hrp2 deletion genotype) was high (chromosome 8: 29X; chromosome 13: 48X), and, as expected, mean coverage at the hrp2 positions was 0X, while hrp3 mean coverage was 45X (with 100% of CDS coordinates with coverage > 0). The HB3 strain (hrp3 deletion genotype) was sequenced to ~ 145X coverage (chromosome 08 and chromosome 13 subtelomeric regions with 159X and 131X coverage, respectively). Mean coverage at the hrp3 positions was 0X (50% proportional coverage) and while hrp2 was similar to genome-wide coverage (142X, with 100% proportional coverage of gene positions). Residual coverage may have occurred at the hrp3 gene of HB3 despite its known deletion due to mapping of some reads originating from hrp2 and mapping to similar but non-orthologous locations. GC3 correctly identified HB3 as having a hrp3 gene deletion (Table 1). These results are similar to previously described coverage profiles of Dd2 and HB3 [18].

Table 1

Subtelomeric coverage, gene coverage, and coding region proportional coverage among known reference strains.

 

Mean subtelomeric coverage

Mean gene coverage

Proportion of coding positions with ≥ 1X coverage (%)

Reference Strain

Chromosome 08

Chromosome 13

hrp2

hrp3

hrp2

hrp3

NF54

156.5

173.7

125.2

134.8

100%

100%

7G8

201.9

236.3

247.0

269.9

100%

100%

NF135.C10

35.3

42.7

36.9

41.7

100%

97%

NF166

280.0

328.4

295.7

344.6

100%

100%

Dd2

29.3

48.3

0.0

45.0

0%

100%

HB3

159.3

130.5

142.2

0.35

100%

50%

Plotting of read coverage in subtelomeric region of select reference strains

GC3 can create plots of the sliding window findings in order to provide a visual perspective of the target region. To illustrate the coverage data provided in Table 1, the subtelomeric regions containing hrp2 and hrp3 of reference strains Dd2, HB3 and NF54 were plotted (Fig. 3). Results were normalized on a log scale to better visualize large fluctuations in coverage generated by whole genome shotgun sequencing. Sliding window plots confirm validation results of Dd2, HB3 and NF54, and clearly illustrate coverage for each respective strain. On chromosome 08, coverage of the Dd2 strain decreases to zero for several thousand base pairs that include the hrp2 locus, whereas NF54 and HB3 have high coverage in the same region. On chromosome 13, it is the HB3 strain that has several thousand base pairs with little or no coverage, including the hrp3 locus, whereas NF54 and Dd2 coverage remains high. GC3 visuals showed no deletions in NF54, a complete hrp2 deletion in Dd2 and a large section of little to no coverage at the hrp3 locus in HB3, respectively.

GC3 agreement with a hrp2/3-specific qPCR assay on field samples

A subset of global samples (n = 17) and reference strains (n = 4) underwent qPCR specific for hrp2 and hrp3 to compare with GC3 results. Two among the selected global samples were excluded due to low parasitemia resulting in very low or no detection of the positive control gene by qPCR (Cq threshold cutoff = 37.5). There was very good agreement between GC3 (computational) and qPCR assay results. Out of four references strains and remaining 15 global samples, GC3 matched qPCR results 19/19 (100%) for hrp2 and 18/19 (94.7%) for hrp3 (Table 2). Only one sample (IGS-CBD-099) had a discordant result between methods. In particular, for this sample, GC3 classified it as having a partial deletion at the hrp3 locus, and coverage assessment with base-pair granularity suggested partial lack of read coverage, including the exon 2 primer binding region, between 2841390–2841412 (Fig. 4). It is noteworthy that the average coverage in this region is very low (~ 1X), however coverage is high for the corresponding chromosomal subtelomeric and core regions (~ 124X and ~ 143X, respectively). On the other hand, the qPCR assay was positive for hrp3 (Cq = 25.4). Taken together, the results suggest the sample has a partial deletion at the hrp3 locus, which does not encompass the qPCR primer binding regions, but that is possibly close enough to the binding site of the primer in exon 2 to interfere with read mapping in that region.

Table 2

Agreement between GC3 deletion assessment results on global samples and hrp2/3-specific qPCR assaya.

     

hrp2 b

hrp3 b

rnr2e2

(control gene)

Sample name

Country

Phenotype subgroup

GC3/PCR

GC3/PCR

PCR

7G8

Reference (Brazil)

Control

present/present

present/present

present

NF54

Reference

Control

present/present

present/present

present

Dd2

Reference (Laos)

Control - hrp2 deletion

absent/absent

present/present

present

HB3

Reference (Honduras)

Control - hrp3 deletion

present/present

absent/absent

present

IGS-BRA-017sA

Brazil

No deletions

present/present

present/present

present

IGS-THL-017

Thailand

No deletions

present/present

present/present

present

IGS-BRA-021

Brazil

No deletions

present/present

present/present

present

IGS-CBD-026

Cambodia

No deletions

present/present

present/present

present

IGS-CBD-031

Cambodia

hrp2 deletion (complete)

absent/absent

present/present

present

IGS-MLI-036

Mali

hrp3 deletion (complete)

present/present

absent/absent

present

IGS-BRA-001sA

Brazil

double hrp2/3 deletion

absent/absent

absent/absent

present

IGS-CBD-008

Cambodia

Low coverage sample

present/present

present/present

present

IGS-MWI-254sA

Malawi

Low coverage sample

present/present

present/present

present

IGS-MWI-251sA

Malawi

Low coverage sample

present/present

present/present

present

IGS-MLI-039

Mali

hrp2 discordant pair

present/present

present/present

present

IGS-MLI-031

Mali

hrp3 discordant pair

present/present

present/present

present

IGS-CBD-034

Cambodia

hrp2 PCR primer deletion

present/present

present/present

present

IGS-CBD-094

Cambodia

hrp3 PCR primer deletion

present/present

present/present

present

IGS-CBD-099

Cambodia

hrp3 PCR primer deletion

present/present

absent/present

present

a Cells in purple and blue denote agreement between GC3 and qPCR results (i.e. "PCR"); red denotes disagreement between methods.
b For GC3 assignments, if read coverage is present in > 75% of the CDS of a target gene, locus is designated as "present". When read coverage is absent for at least 25% of the CDS, a deletion is inferred, and locus is designated as "absent".

Comparison of GC3 output using Kenyan and Peruvian sample sets previously genotyped for hrp2/3 deletions

Previously, a subset of Kenyan samples (n = 27) was genotyped for hrp2/3 deletions, with two and one deletions identified in hrp2 and hrp3, respectively [18] (Table 3). In addition, Sepúlveda and colleagues also identified no hrp2 deletions among twelve Peruvian samples and two samples with deletions in hrp3 [18]. The criterion previously used to call deletions was > 75% of gene positions with ≤ 2X coverage [18]. We wanted to determine how GC3 performed on a similar set of samples. Accordingly, we downloaded from the SRA samples from the same time point from Kenya (n = 57, including 24 from [18]) and Peru (n = 11, including 7 from [18]). For hrp3, we identified the same number of deletions reported previously [18]. However, the hrp2 were discordant. We identified one complete deletion reported among Kenyan samples, and we observed one partial hrp2 deletion among the Peruvian samples, which differs from previous reporting. Among the Kenyan samples, a previous study reported two hrp2 deletions [18], one of which was also identified by GC3. Whereas the discordant Peruvian sample was only identified to have a deletion by GC3. The difference in assessment is likely due to differences in the criteria used between GC3 and that used by Sepúlveda and colleagues to call deletions.

Table 3

Deletion identification and classification of previously genotyped samples.

Country

n

hrp2

No Deletion

GC3 Deletion Classification (Partial/Complete)a

ex1-/ex2 + b

ex1+/ex2-b

ex1-/ex2-b

Previous Genotype (Deletion/Total)c

Kenya

59

58

0 / 1

0

0

1

2 / 27

Peru

11

10

1 / 0

1

0

0

0 / 12

Country

n

hrp3

No Deletion

GC3Deletion Classification (Partial/Complete)a

ex1-/ex2 + b

ex1+/ex2-b

ex1-/ex2-b

Previous Genotype (Deletion/Total)c

Kenya

59

58

1 / 0

0

0

1

1 / 27

Peru

11

9

2 / 0

0

1

1

2 / 12

ahrp2/3 deletions assigned to isolates with > 25% CDS positions with zero coverage.

bExon absence (-) assigned to isolates if > 25% exon positions have zero coverage. (+) signifies exon is present.

cPrevious hrp2/3 deletion genotype results from Sepúlveda and colleagues. Deletions called for samples with > 75% of coding region with ≤ 2X coverage [18].

Analysis of novel samples for hrp2 and hrp3 deletions

All global samples used in the study (n = 1114) were examined for hrp2/3 deletions (Supplementary Table S2). Cambodian (n = 127) and Brazilian (n = 20) samples were further visualized in more detail at the subtelomeric regions of interest (Supplementary Figure S2) and examined for hrp2/3 exon presence/absence since they have not previously been described (Table 4). Although these are undescribed samples, computational results are comparable to previous estimates in each respective region, where deletions have been previously observed [18, 29]. Among Cambodian samples collected in 2009–2011, there were one hrp2 deletion, eight hrp3 deletions, and two hrp2/3 double deletions (both hrp2 and hrp3), with prevalence of 0.8%, 6.3% and 1.6%, respectively. All hrp2 deletions corresponded to absent exons (> 25% zero coverage positions on both exons), but were classified as two partial and one complete hrp2 deletion as two samples still had coverage in a low proportion on hrp2 positions. Deletions of hrp3 among Cambodian samples were classified as seven partial and three complete deletions. Among Brazilian samples collected in 2016, four had hrp2/3 double deletions, and eleven with hrp3 deletions, corresponding to 20% and 55% prevalences respectively. Of the four hrp2 deletions, two were partial deletions, and two were complete hrp2 deletions. Of the 15 hrp3 deletions, one was a partial deletion on exon 2, and 14 were complete hrp3 deletions.

Table 4

Deletion identification and classification of undescribed samples.

Country

n

hrp2

No Deletion

Deletion Classification (Partial/Complete)a

ex1-/ex2 + b

ex1+/ex2-b

ex1-/ex2-b

Cambodia

127

124

2 / 1

0

0

3

Brazil

20

16

2 / 2

0

1

3

Country

n

hrp3

No Deletion

Deletion Classification (Partial/Complete)a

ex1-/ex2 + b

ex1+/ex2-b

ex1-/ex2-b

Cambodia

127

117

7 / 3

0

2

8

Brazil

20

5

1 / 14

0

1

14

ahrp2/3 deletions assigned to isolates with > 25% CDS positions with zero coverage.

bExon absence (-) assigned to isolates if > 25% exon positions have zero coverage. (+) signifies exon is present.

Quantifying flanking region deletions among Cambodian samples

Deletions at the hrp2/3 positions may extend to flanking genes, possibly with additional impact on overall parasite fitness. Therefore, an option in the GC3 R-script was built in to assign deletions of flanking genes. To determine whether deletions extend into these flanking coding regions, a table was generated for Cambodian samples, where samples with > 25% gene positions with zero coverage in upstream and downstream flanking regions were classified as having a locus deletion (Table 5). Of note are the observations that the presence of a hrp2/3 deletion is not always associated with deletions in flanking genes and, conversely, deletions in flanking genes are not always associated with hrp2/3 deletions. A subset of Cambodian samples has been plotted to illustrate flanking gene coverage as it relates to hrp2 or hrp3 (Supplementary Figure S3). Results suggest that deletions can occur independently in hrp2 (or hrp3) and their respective flanking genes.

Table 5

Frequency of deletions in hrp2/3 flanking genes, among Cambodian samples.

 

hrp2a

hrp3a

Flanking geneb

(Upstream / Downstream)c

Present

Absent

Present

Absent

Present / Present

124

2

113

3

Absent / Present

0

1

4

0

Present / Absent

0

0

0

1

Absent / Absent

0

0

0

6

Total

124

3

117

10

aDeletion assigned to samples if > 25% of coding region positions had zero coverage
bDeletion of flanking genes assigned to samples with > 25% gene positions with zero coverage
cFor hrp2, an upstream gene was a STEVOR family gene, and a downstream gene encoded heat shock protein 70. For hrp3, upstream and downstream genes were PHIST-encoding genes of unknown function.

Longitudinal standardized coverage of hrp2/3 positions

Coverage plots of hrp2/3 coordinates were generated for Cambodian, Malawian, and Malian samples to demonstrate: 1) magnified plots of only hrp2/3 positions and 2) differences in relative depth of coverage between samples collected at different time points (Fig. 5). To account for differences in sequencing depths between samples, coverage values were standardized. For each sample, site or locus coverage were divided by the expected coverage, obtained from mean coverage in subtelomeric region in which each locus is located (see Methods for coordinates). Standardized coverage of ~ 1 shows locus coverage similar to subtelomeric mean coverage. A strong, positive correlation between subtelomeric and hrp2/3 gene coverage justifies the use of this standardization approach (Supplementary Figures S4 and S5). The expectation of results is that standardized coverage over time will decrease if hrp2/3 deletions increase in frequency. To avoid undue impact of outlier standardized values, median standardized coverage was plotted per group.

Standardized read coverage for Cambodian samples were plotted alongside standardized coverage for the geographically representative strain NF135.C10 (Fig. 5A), and showed that the uneven standardized coverage of NF135.C10 is mirrored in the clinical samples. This suggests there are sequence-inherent properties that impact sequencing or mapping success. (See Supplementary Table S3 for descriptive coverage of each sample). In contrast, Malawian samples collected in 2014–2016 had lower median standardized coverage than the sample set collected in 2007–2008 (Fig. 5B), suggesting an increase in hrp2 and hrp3 deletions between the two time points. Interestingly, median standardized coverage is low (< 1) in both time points, showing that read coverage in the target genes is half of that in subtelomeric regions. The majority of Malawian samples underwent sWGA (n = 139) prior to sequencing (Supplementary Table S4) which may explain the lower standardized coverage as compared to standardized coverage of directly sequenced samples (Supplementary Figure S6). Of further interest, Mali samples from 2002 had lower standardized coverage than 2010 samples on both hrp2 and hrp3 positions (Fig. 5C). All Malian WGS data was obtained by direct sequencing of total DNA from venous blood, using a similar protocol [21, 30] with good coverage in the core genomes (Supplementary Table S5), implying the quality of WGS data did not contribute to this result. Overall, figures offer a visual perspective between different time points as monitoring of hrp2/3-deletions become crucial in the event of their expansion, but prior to any interpretation, the factors that influence WGS coverage should be considered.

Count Of Positions With Zero Coverage

To provide a clearer illustration of the proportional frequency of gene coordinates with zero coverage and the location of those positions along the locus, a view of hrp2 and hrp3 gene positions by proportional counts of no coverage (0X coverage) vs. coverage (≥ 1X coverage) was generated for Cambodian samples (Fig. 6). Supplementary Figure S7 provides the same proportional counts of zero coverage per position for Malawian and Mali samples. Among Cambodian samples, there is a clear increase in 0X coverage positions at the intron regions (hrp2 intron: 1375232–1375083; hrp3 intron: 2841636–2841484) relative to exon coverage. This is to be expected, as the length (145–148 bp) and the nucleotide composition of these Pf introns (hrp2 AT%: 91%; hrp3 AT%: 91.2%) prevent unambiguous mapping of 101 bp-long reads centered in the middle of the intron. On hrp3 positions, there are also two spikes in zero coverage positions on either side of coordinate 2,841,250, likely due to differences in Cambodia samples compared to Pf3D7 reference, such as indels or rapidly evolving sequence motifs among genetically similar Cambodian strains, which prevent read mapping in a subset of samples. That read mapping pattern is also observed in the troughs in hrp3 coverage plot in Fig. 5A, for samples collected in 2010 and 2011 (but curiously not in Pf NF135).

Discussion

Next-generation short-read WGS data has the capability to provide detailed genotype information, but often necessitates a good understanding and use of bioinformatics tools and packages. GC3 was developed to be a user-friendly computational tool to (1) extract coverage profiles of target genome regions, 2) provide interpretable results regarding deletion prevalence, 3) classify samples according to the type of gene deletions, and (4) inform NMCPs concerning coverage of important genes. In this study, we demonstrated that GC3 can be used for these purposes by applying it to Plasmodium falciparum genome segments, specifically the regions containing hrp2 and hrp3 genes. Most NMCPs in malaria-endemic settings rely on HRP2-based RDTs for day-to-day diagnosis in both clinical and field settings. There is evidence of recent expansions of Pf strains lacking HRP2 expression, a cause for concern as stated by the WHO [1, 19, 31, 32]. Computational toolkits, such as GC3, need to be able to efficiently analyze Pf genomic data and assess for the presence of hrp2- or hrp3-deletion strains, which provides a valuable monitoring resource to researchers and public health professionals concerned with malaria RDT effectiveness.

This work demonstrated the validity and utility of the GC3 tool to measure hrp2 and hrp3 deletion frequencies in Pf individual samples and in sample sets. The sliding window capability of GC3 provided a wider view of large deletions in reference lab strains (i.e. Dd2 and HB3) while adjusting for fluctuations in subtelomeric read coverage and allowed for visually-friendly figures. The option of extracting and plotting every chromosome position within an interval allowed for a magnified view of target loci, and better illustrated details in coverage within the target gene. GC3’s genotype results were validated against previously reported genotypes of Pf laboratory reference and representative strains, and similar, publicly available sample sets from Kenya and Peru, where hrp2/3 deletions have been observed [11, 18, 33]. Analysis of samples from Cambodia and Brazil demonstrate GC3’s capability to process novel WGS data from regions other than West Africa, the location of origin of PfNF54, the parental isolate from which the reference 3D7 was cloned [34, 35]. Results are consistent with previous estimates of hrp2/3 deletions among each respective country [19, 29, 36]. Of note is the high prevalence of all deletions, especially hrp3 deletions among Brazilian samples. Deletions in hrp2/3, and reports of high deletion prevalence, were first observed in the American continent [7, 8, 16, 18, 3638], especially hrp3 deletions [7, 39]. Overall, GC3 can appropriately process and analyze publicly available WGS datasets from a variety of genomic studies.

Additional comparison against a hrp2/3-specific qPCR assay demonstrated very good reliability of GC3’s capability. Although there was one discordant result between tools, this may be a reflection of GC3’s sensitivity and the qPCR assay’s difficulty to detect partial deletions, as these are only detected by the qPCR-based assay if they overlap the primer-binding or amplicon sites. A potential challenge for GC3 are the samples with a very low amount of parasite DNA resulting on overall low depth of coverage and uneven representation of the locus in the genomic library and/or among the WGS data, leading to significant regions of the hrp2/3 loci with zero coverage (and then a ‘deletion’ assessment by GC3), despite the loci being present in the genome. However, this situation was not observed in our study. In the specific case of discordant results in a sample from Cambodia, the sample had very high coverage at the core and subtelomeric regions. In general, partial deletions present a challenge since some cases have shown a qPCR assay can amplify part of hrp2/3, but corresponds to false negative RDT diagnoses [40, 41]. Ultimately, there were very few such samples, so their impact on overall results is considered minimal. Although beyond the scope of this study, further examination of hrp2/3 partial deletions, their specific location within the locus and their respective RDT diagnosis may provide valuable information regarding the most appropriate criteria and thresholds to accurately identify gene deletions with a functional phenotype, i.e., those deletions that abrogate protein expression.

Among Cambodian samples, read coverage in flanking genes was further analyzed and demonstrated that hrp2/3 deletions can be restricted to just the locus proper, or extend to flanking genes, but without a discernible pattern. These results are consistent to previous reports [18, 19, 39, 42, 43]. Overall, GC3 reported similar results in deletion prevalence and classification trends among global malaria-endemic regions.

Utilizing the function of GC3 to extract all positions (i.e. interval = 1, step size = 1) and plotting hrp2/3 coding positions only, coverage is clearer and subsets of samples can be compared, if desired. When using this option to compare countrywide samples collected at different time points, interestingly 2002 Malian samples had lower standardized coverage than older samples (2002 relative to 2010) for both hrp2 and hrp3. This result was unexpected, since the frequency of deletions is expected to increase over time, due to the selection imposed by parasite detection by RDT. Given the size of the sample subgroups, consistent results across both hrp2 and hrp3, and generation of WGS data directly from total DNA extracted from venous whole blood in both 2002 [30] and 2010 [21], it is improbable that differences in read mapping success would cause such a difference in coverage. Interestingly, hrp2 deletions had already been observed in Mali in the late 1990s prior to significant RDT use in the country, indicative of spontaneous deletions that were maintained despite the absence pressure from RDT usage [44]. In fact, random polymorphisms can occur naturally in the subtelomeric region, including large deletions, without evolutionary pressure [45], Further, the fitness cost associated with hrp2 loss is not significant, although some small cost is associated with hrp3 deletions [3, 18]. Considering previous observations and minimal fitness cost associated with hrp2/3, GC3’s Mali results are possibly due to the expansion of a circulating genotype with naturally occurring hrp2/3 deletions in 2002 that were less prevalent or non-existent by 2010. Ultimately, this observation, together with the high frequency of hrp2 and, especially, hrp3 deletions in Brazilian isolates, suggest that there may be factors that impact deletion rate at these loci or forces, other than RDT usage, that favor strains with hrp2/3 deletions.

Closer examination of the proportional coverage at each hrp2/3 position among Cambodian samples, revealed in each gene positions with zero read coverage resulting from mapping artifacts, and which may confound results. On hrp2 positions, zero coverage positions increase and then spike around the intron region which, as mentioned before, is likely due to the intron’s high AT content [46] that can cause challenges for read mapping. Interestingly, there were two spikes in zero coverage positions on the hrp3 locus. Examination of a previous whole genome hierarchical cluster analysis of the same Cambodian samples [47] revealed that the majority of samples contributing to one or both peaks belong to the same Cambodian subpopulation and hence share a similar genetic background. Ultimately, the figure offers a useful preliminary view of the hrp2/3 genes and their characteristics.

Some limitations exist when analyzing the GC3’s results. Firstly, the quality of computational results is influenced by the depth of the P. falciparum sequencing data, as measured by the total number of reads mapped to the reference genome. In the case of P. falciparum, we have observed that the percentage of genome with coverage asymptotes at ~ 12 million 100-bp reads mapped to the parasite genome, averaging ~ 52X coverage genome-wide. It is also recommended the user apply GC3 to calculate mean coverage over the region/chromosome where the target gene is located to estimate expected coverage at the general area of the target gene. Further, high-quality WGS data results obtained with well-described DNA extraction methods and sequencing methods, either by direct sequencing or sWGA [21, 23], and established quality control and filtering protocols should be used when comparing samples from different studies. While comparing standardized coverage between direct and sWGA sequence data, direct sequencing achieves more uniform coverage due to the inefficient amplification on the subtelomeric region by sWGA primers [23], but sWGA still provides good coverage at hrp2/3 gene positions (Supplementary Figure S6). Even with high-quality data, and pertinent to hrp2/3 deletion detection, polyclonal infections add another layer of complexity since they can mask the lack of coverage at a target gene, especially in high transmission regions [17]. Particularly if GC3 is the only method being implemented to assess for deletions, this factor would need to be considered. The deletion criteria can be easily adjusted to be more stringent by the user depending on their purposes, much like the deletion criteria used in the comparator study [18]. Despite the limitations, GC3 appropriately processed hrp2/3 coverage data and classified deletions. Its utility can be extended to analyze and visualize coverage data of any target gene on any pathogen.

Summary statement

Overall, validation of the GC3 tool to extract and process WGS data was successful when comparing with expected results using reference strains, well-described samples and a hrp2/3-specific qPCR assay. Following our criteria for identifying deletions, the results agreed with previous estimations of hrp2/3 deletion frequency in each respective country. Apparent in the results is the level of detail that can be extrapolated from short-read WGS data and viewed using a comprehensive computational tool. Although challenges persist in ensuring high-quality WGS data and achieving similar coverage among low parasitemia samples using sWGA, we expect GC3’s results to be fairly robust. Further investigation of the partial deletions threshold that results in a false negative RDT diagnosis is needed to validate the deletion criteria. Ultimately, groups investigating a target gene’s coverage can use GC3 to efficiently generate translatable results and figures to understand and interpret broad patterns using hundreds to thousands of previously generated genomic datasets.

List Of Abbreviations

NMCP

National Malaria Control Program

RDTs

Rapid Diagnostic Tests

hrp2/3

loci encoding histidine-rich proteins 2 and 3

Pf

Plasmodium falciparum

WGS

Whole genome sequencing

sWGA

selective whole genome amplification

SRA

Sequence Read Archive

BAM

Binary sequence Alignment/Map

GATK

Genome Analysis Toolkit

BED

Browser Extensible Data

bp

base pair

CDS

coding DNA sequence

Declarations

Ethics approval and consent to participate

All WGS datasets used are publicly available through GenBank. P. falciparum samples for which DNA material was utilized to perform qPCR were collected during field studies in Brazil, Malawi, Mali, and Thailand. All samples were collected after recruitment/enrollment and with written informed consent from the subject or a parent/guardian, and assent obtained from children under 18 years old. Informed consent included authorization to generate parasite genomic data. The respective field studies were approved by the following Institutional review boards (IRBs): Institute of Biomedical Sciences, University of São Paulo (1368/17/CEPSH), for collection in Acre, Brazil; Malawi National Health Sciences Research Committee, and the IRB at the University of Maryland, Baltimore (FWA#00005976), for collection in Malawi; Comité D’Éthique de la FMPOS (Faculté de Médecine de Pharmacie et d’Odonto-stomatologie), Université des Sciences, des Techniques, et des Technologies de Bamako, Mali, and the IRB at the University of Maryland, Baltimore (HP-00041382), for collection on Mali; Ethics Review Committee of the Department of Medical Research, Human Subject Protection Branch (HSPB) of the Walter Reed Army Institute of Research, Silver Spring, MD (FWA# 00000015, IRB# 00000794), and Institute for the Development of Human Research Protection (IHRP), Ministry of Public Health, Thailand (FWA# 00017503, IRB#00006539), for collections in Thailand. This study conforms to the principles established in the Helsinki Declaration. 

Consent for publication

Not applicable

Availability of data and materials

The datasets from field studies are available from public repositories, with access information reported previously [21].

Competing interests

The authors declare that they have no competing interests.

Funding

This work was funded in part by the National Institutes of Health (NIH) awards U19 AI110820 and R01 AI141900 to JCS. AO received funding from the National Heart, Lung and Blood Institute (1K01HL140285-01A1). 

Author’s contributions

TCS conceived the study, designed the bioinformatics toolkit, performed the analyses, interpreted the results and wrote the paper. AD supported the development of GC3, contributed to the interpretation of results, and edited the manuscript. JCS conceived and designed the study and bioinformatics toolkit, interpreted the results, contributed funding and resources, and wrote the paper. BS and SJ were responsible for all laboratory procedures and edited the paper. TS designed the hrp2/3-specific qPCR protocol and edited the paper. AO contributed to the interpretation of results and edited the paper. GG and CD contributed fundamental resources to the study and reviewed the paper. All authors read and approved the final version of the manuscript.

Acknowledgements

We would like to thank the study teams that supported the collection of samples in the Brazil, Malawi, Mali, and Myanmar and Thailand studies. Specifically, we would like to acknowledge the generosity of the UMB PIs from each respective study, namely Dr. Miriam Laufer, Dr. Mark Travassos, and Dr. Shannon Takala-Harrison as well as the University of Sao Paulo PI, Dr. Marcelo Urbano Ferreira. Parasite isolate DNA from Cambodia and Thailand was derived from clinical samples collected by the Department of Bacterial and Parasitic Diseases at the Armed Forces Research Institute of Medical Sciences in collaboration with Dr. Shannon Takala-Harrison and had undergone whole genome sequencing as part of a previous study [48]. Malawian isolates were generated from samples by the Mfera Health Center in the Chikhwawa district from a previous investigation [49]. Malian sequences comes from previous studies conducted in collaboration with the Bandiagara Malaria Project [21, 30]. Brazilian samples were collected by the University of Sao Paulo Department of Public Health Practices [21]. Accession numbers of samples are provided in the supplementary materials Table S1. 

References

  1. WHO: World malaria report 2021. Geneva: World Health Organization; 2021.
  2. Mukkala AN, Kwan J, Lau R, Harris D, Kain D, Boggild AK: An Update on Malaria Rapid Diagnostic Tests. Current Infectious Disease Reports 2018, 20:49.
  3. Poti KE, Sullivan DJ, Dondorp AM, Woodrow CJ: HRP2: Transforming Malaria Diagnosis, but with Caveats. Trends in Parasitology 2020, 36:112–126.
  4. Kong A, Wilson SA, Ah Y, Nace D, Rogier E, Aidoo M: HRP2 and HRP3 cross-reactivity and implications for HRP2-based RDT use in regions with Plasmodium falciparum hrp2 gene deletions. Malaria Journal 2021, 20:207.
  5. Thomson R, Parr JB, Cheng Q, Chenet S, Perkins M, Cunningham J: Prevalence of Plasmodium falciparum lacking histidine-rich proteins 2 and 3: a systematic review. Bulletin of the World Health Organization 2020, 98:558-568F.
  6. Kozycki CT, Umulisa N, Rulisa S, Mwikarago EI, Musabyimana JP, Habimana JP, Karema C, Krogstad DJ: False-negative malaria rapid diagnostic tests in Rwanda: impact of Plasmodium falciparum isolates lacking hrp2 and declining malaria transmission. Malaria Journal 2017, 16:123.
  7. Gendrot M, Fawaz R, Dormoi J, Madamet M, Pradines B: Genetic diversity and deletion of Plasmodium falciparum histidine-rich protein 2 and 3: a threat to diagnosis of P. falciparum malaria. Clinical Microbiology and Infection 2019, 25:580–585.
  8. Agaba BB, Yeka A, Nsobya S, Arinaitwe E, Nankabirwa J, Opigo J, Mbaka P, Lim CS, Kalyango JN, Karamagi C, Kamya MR: Systematic review of the status of pfhrp2 and pfhrp3 gene deletion, approaches and methods used for its estimation and reporting in Plasmodium falciparum populations in Africa: review of published studies 2010–2019. Malaria Journal 2019, 18:355.
  9. Akinyi Okoth S, Abdallah JF, Ceron N, Adhin MR, Chandrabose J, Krishnalall K, Huber CS, Goldman IF, Macedo de Oliveira A, Barnwell JW, Udhayakumar V: Variation in Plasmodium falciparum Histidine-Rich Protein 2 (Pfhrp2) and Plasmodium falciparum Histidine-Rich Protein 3 (Pfhrp3) Gene Deletions in Guyana and Suriname. PLOS ONE 2015, 10:e0126805.
  10. Berzosa P, González V, Taravillo L, Mayor A, Romay-Barja M, García L, Ncogo P, Riloha M, Benito A: First evidence of the deletion in the pfhrp2 and pfhrp3 genes in Plasmodium falciparum from Equatorial Guinea. Malaria Journal 2020, 19:99.
  11. Beshir KB, Sepúlveda N, Bharmal J, Robinson A, Mwanguzi J, Busula AO, de Boer JG, Sutherland C, Cunningham J, Hopkins H: Plasmodium falciparum parasites with histidine-rich protein 2 (pfhrp2) and pfhrp3 gene deletions in two endemic regions of Kenya. Scientific Reports 2017, 7:14718.
  12. Funwei R, Nderu D, Nguetse CN, Thomas BN, Falade CO, Velavan TP, Ojurongbe O: Molecular surveillance of pfhrp2 and pfhrp3 genes deletion in Plasmodium falciparum isolates and the implications for rapid diagnostic tests in Nigeria. Acta Trop 2019, 196:121–125.
  13. Kumar N, Pande V, Bhatt RM, Shah NK, Mishra N, Srivastava B, Valecha N, Anvikar AR: Genetic deletion of HRP2 and HRP3 in Indian Plasmodium falciparum population and false negative malaria rapid diagnostic test. Acta Tropica 2013, 125:119–121.
  14. Thomson R, Beshir KB, Cunningham J, Baiden F, Bharmal J, Bruxvoort KJ, Maiteki-Sebuguzi C, Owusu-Agyei S, Staedke SG, Hopkins H: pfhrp2 and pfhrp3 Gene Deletions That Affect Malaria Rapid Diagnostic Tests for Plasmodium falciparum: Analysis of Archived Blood Samples From 3 African Countries. The Journal of Infectious Diseases 2019, 220:1444–1452.
  15. Verma AK, Bharti PK, Das A: HRP-2 deletion: a hole in the ship of malaria elimination. The Lancet Infectious Diseases 2018, 18:826–827.
  16. Molina-de la Fuente I, Pastor A, Herrador Z, Benito A, Berzosa P: Impact of Plasmodium falciparum pfhrp2 and pfhrp3 gene deletions on malaria control worldwide: a systematic review and meta-analysis. Malaria Journal 2021, 20:276.
  17. Watson OJ, Slater HC, Verity R, Parr JB, Mwandagalirwa MK, Tshefu A, Meshnick SR, Ghani AC: Modelling the drivers of the spread of Plasmodium falciparum hrp2 gene deletions in sub-Saharan Africa. eLife 2017, 6:e25008.
  18. Sepúlveda N, Phelan J, Diez-Benavente E, Campino S, Clark TG, Hopkins H, Sutherland C, Drakeley CJ, Beshir KB: Global analysis of Plasmodium falciparum histidine-rich protein-2 (pfhrp2) and pfhrp3 gene deletions using whole-genome sequencing data and meta-analysis. Infection, Genetics and Evolution 2018, 62:211–219.
  19. Gibbons J, Qin J, Malla P, Wang Z, Brashear A, Wang C, Miao J, Adams JH, Kim K, Jiang R, Cui L: Lineage-Specific Expansion of Plasmodium falciparum Parasites With pfhrp2 Deletion in the Greater Mekong Subregion. The Journal of Infectious Diseases 2020, 222:1561–1569.
  20. Feleke SM, Reichert EN, Mohammed H, Brhane BG, Mekete K, Mamo H, Petros B, Solomon H, Abate E, Hennelly C, et al: Emergence and evolution of Plasmodium falciparum histidine-rich protein 2 and 3 deletion mutant parasites in Ethiopia. 2021:2021.2001.2026.21250503.
  21. Moser KA, Drábek EF, Dwivedi A, Stucke EM, Crabtree J, Dara A, Shah Z, Adams M, Li T, Rodrigues PT, et al: Strains used in whole organism Plasmodium falciparum vaccine trials differ in genome structure, sequence, and immunogenic potential. Genome Medicine 2020, 12:6.
  22. Malaria GENPfCP: Genomic epidemiology of artemisinin resistant malaria. eLife 2016, 5:e08714.
  23. Oyola SO, Ariani CV, Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, Rutledge GG, Redmond S, Manske M, Jyothi D, et al: Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification. Malaria Journal 2016, 15:597.
  24. Zhang M, Wang C, Otto TD, Oberstaller J, Liao X, Adapa SR, Udenze K, Bronner IF, Casandra D, Mayho M, et al: Uncovering the essential genes of the human malaria parasite Plasmodium falciparum by saturation mutagenesis. 2018, 360:eaap7847.
  25. Schindler T, Deal AC, Fink M, Guirou E, Moser KA, Mwakasungula SM, Mihayo MG, Jongo SA, Chaki PP, Abdulla S, et al: A multiplex qPCR approach for detection of pfhrp2 and pfhrp3 gene deletions in multiple strain infections of Plasmodium falciparum. Scientific Reports 2019, 9:13107.
  26. Abdallah JF, Okoth SA, Fontecha GA, Torres REM, Banegas EI, Matute ML, Bucheli STM, Goldman IF, de Oliveira AM, Barnwell JW, Udhayakumar V: Prevalence of pfhrp2 and pfhrp3 gene deletions in Puerto Lempira, Honduras. Malaria Journal 2015, 14:19.
  27. Munro JB, Jacob CG, Silva JC: A novel clade of unique eukaryotic ribonucleotide reductase R2 subunits is exclusive to apicomplexan parasites. J Journal of molecular evolution 2013, 77:92–106.
  28. Xiao C, Ye J, Esteves RM, Rong C: Using Spearman's correlation coefficients for exploratory data analysis on big dataset. 2016, 28:3866–3878.
  29. Rachid Viana GM, Akinyi Okoth S, Silva-Flannery L, Lima Barbosa DR, Macedo de Oliveira A, Goldman IF, Morton LC, Huber C, Anez A, Dantas Machado RL, et al: Histidine-rich protein 2 (pfhrp2) and pfhrp3 gene deletions in Plasmodium falciparum isolates from select sites in Brazil and Bolivia. PLOS ONE 2017, 12:e0171150.
  30. Lyke KE, Burges RB, Cissoko Y, Sangare L, Kone A, Dao M, Diarra I, Fernández-Viňa MA, Plowe CV, Doumbo OK, Sztein MB: HLA-A2 Supertype-Restricted Cell-Mediated Immunity by Peripheral Blood Mononuclear Cells Derived from Malian Children with Severe or Uncomplicated Plasmodium falciparum Malaria and Healthy Controls. 2005, 73:5799–5808.
  31. Golassa L, Messele A, Amambua-Ngwa A, Swedberg G: High prevalence and extended deletions in Plasmodium falciparum hrp2/3 genomic loci in Ethiopia. PLOS ONE 2020, 15:e0241807.
  32. Berhane A, Anderson K, Mihreteab S, Gresty K, Rogier E, Mohamed S, Hagos F, Embaye G, Chinorumba A, Zehaie A, et al: Major Threat to Malaria Control Programs by Plasmodium falciparum Lacking Histidine-Rich Protein 2, Eritrea. Emerging infectious diseases 2018, 24:462–470.
  33. Gamboa D, Ho M-F, Bendezu J, Torres K, Chiodini PL, Barnwell JW, Incardona S, Perkins M, Bell D, McCarthy J, Cheng Q: A Large Proportion of P. falciparum Isolates in the Amazon Region of Peru Lack pfhrp2 and pfhrp3: Implications for Malaria Rapid Diagnostic Tests. PLOS ONE 2010, 5:e8091.
  34. Walliker D, Quakyi IA, Wellems TE, McCutchan TF, Szarfman A, London WT, Corcoran LM, Burkot TR, Carter R: Genetic Analysis of the Human Malaria Parasite Plasmodium falciparum. 1987, 236:1661–1666.
  35. Preston MD, Campino S, Assefa SA, Echeverry DF, Ocholla H, Amambua-Ngwa A, Stewart LB, Conway DJ, Borrmann S, Michon P, et al: A barcode of organellar genome polymorphisms identifies the geographic origin of Plasmodium falciparum strains. Nat Commun 2014, 5:4052.
  36. Góes L, Chamma-Siqueira N, Peres JM, Nascimento JM, Valle S, Arcanjo AR, Lacerda M, Blume L, Póvoa M, Viana G: Evaluation of Histidine-Rich Proteins 2 and 3 Gene Deletions in Plasmodium falciparum in Endemic Areas of the Brazilian Amazon. 2021, 18:123.
  37. Bhasin VK, Trager W: Gametocyte-forming and non-gametocyte-forming clones of Plasmodium falciparum. The American journal of tropical medicine and hygiene 1984, 33:534–537.
  38. Vera-Arias CA, Holzschuh A, Oduma CO, Badu K, Abdul-Hakim M, Yukich J, Hetzel MW, Fakih BS, Ali A, Ferreira MU, et al: Plasmodium falciparum hrp2 and hrp3 gene deletion status in Africa and South America by highly sensitive and specific digital PCR. 2021:2021.2006.2001.21258117.
  39. Wurtz N, Fall B, Bui K, Pascual A, Fall M, Camara C, Diatta B, Fall KB, Mbaye PS, Diémé Y, et al: Pfhrp2 and pfhrp3 polymorphisms in Plasmodium falciparum isolates from Dakar, Senegal: impact on rapid malaria diagnostic tests. Malaria Journal 2013, 12:34.
  40. Houzé S, Hubert V, Pessec GL, Bras JL, Clain J: Combined Deletions of pfhrp2 and pfhrp3 Genes Result in Plasmodium falciparum Malaria False-Negative Rapid Diagnostic Test. 2011, 49:2694–2696.
  41. Nima MK, Hougard T, Hossain ME, Kibria MG, Mohon AN, Johora FT, Rahman R, Haque R, Alam MS: Case Report: A Case of Plasmodium falciparum hrp2 and hrp3 Gene Mutation in Bangladesh. The American journal of tropical medicine and hygiene 2017, 97:1155–1158.
  42. Li P, Xing H, Zhao Z, Yang Z, Cao Y, Li W, Yan G, Sattabongkot J, Cui L, Fan Q: Genetic diversity of Plasmodium falciparum histidine-rich protein 2 in the China–Myanmar border area. Acta Tropica 2015, 152:26–31.
  43. Parr JB, Verity R, Doctor SM, Janko M, Carey-Ewend K, Turman BJ, Keeler C, Slater HC, Whitesell AN, Mwandagalirwa K, et al: Pfhrp2-Deleted Plasmodium falciparum Parasites in the Democratic Republic of the Congo: A National Cross-sectional Survey. The Journal of Infectious Diseases 2016, 216:36–44.
  44. Koita OA, Doumbo OK, Ouattara A, Tall LK, Konaré A, Diakité M, Diallo M, Sagara I, Masinde GL, Doumbo SN, et al: False-negative rapid diagnostic tests for malaria and deletion of the histidine-rich repeat region of the hrp2 gene. The American journal of tropical medicine and hygiene 2012, 86:194–198.
  45. de Bruin D, Lanzer M, Ravetch JV: The polymorphic subtelomeric regions of Plasmodium falciparum chromosomes contain arrays of repetitive sequence elements. 1994, 91:619–623.
  46. Hamilton WL, Claessens A, Otto TD, Kekre M, Fairhurst RM, Rayner JC, Kwiatkowski D: Extreme mutation bias and high AT content in Plasmodium falciparum. Nucleic Acids Research 2016, 45:1889–1901.
  47. Agrawal S, Moser KA, Morton L, Cummings MP, Parihar A, Dwivedi A, Shetty AC, Drabek EF, Jacob CG, Henrich PP, et al: Association of a Novel Mutation in the Plasmodium falciparum Chloroquine Resistance Transporter With Decreased Piperaquine Sensitivity. The Journal of Infectious Diseases 2017, 216:468–476.
  48. Takala-Harrison S, Jacob CG, Arze C, Cummings MP, Silva JC, Dondorp AM, Fukuda MM, Hien TT, Mayxay M, Noedl H, et al: Independent Emergence of Artemisinin Resistance Mutations Among Plasmodium falciparum in Southeast Asia. The Journal of Infectious Diseases 2014, 211:670–679.
  49. Buchwald AG, Sixpence A, Chimenya M, Damson M, Sorkin JD, Wilson ML, Seydel K, Hochman S, Mathanga DP, Taylor TE, Laufer MK: Clinical Implications of Asymptomatic Plasmodium falciparum Infections in Malawi. Clinical Infectious Diseases 2018, 68:106–112.