Generalised Fiebiglike staging
The fundamental feature of the Fiebig staging system (2) is that it identifies a naturallyoccurring sequence of discordant diagnostic tests, which together indicate early clinical disease progression. The approximate duration of infection can be deduced by analysing the combination of specific assay results and assigning the appropriate “stage”.
As we demonstrate below, it is preferable to interpret any combination of diagnostic test results into an estimated duration of infection, if these tests have been independently benchmarked for diagnostic sensitivity (i.e. a median or mean duration of time from infection to detectability on that assay has been estimated). Unlike with Fiebig staging, this more nuanced method allows both for incorporation of results from any available test, and from results of tests run on specimens taken on different days.
In contrast to the usual statistical definition of ‘sensitivity’ as the proportion of ‘true positive’ specimens that produce a positive result, we summarise the populationlevel sensitivity of any particular diagnostic test into one or two ‘diagnostic delay’ parameters ( and in Figure 1). Interpreted at the population level, a particular test’s sensitivity curve expresses the probability that a specimen obtained at some time after infection will produce a positive result. The key features of a test’s sensitivity curve (represented by the purple curve in Figure 1) are that:
 there is effectively no chance of detecting an infection immediately after exposure;
 after some time, the test will almost certainly detect an infection;
 there is a characteristic time range over which this function transitions from close to zero to close to one. This can be summarised as something very much like a mean or median and a standard deviation.
By far the most important parameter is an estimate of ‘median diagnostic delay.’ In Figure 1, this is the parameter . If there were perfect test result conversion for all subjects (i.e. no assay ‘noise’), and no intersubject variability, this would reduce the smoothly varying purple curve to a step function.
Various host and pathogen attributes, such as concurrent infections, age, pregnancy status, the particular viral genotype, postinfection factors, etc., affect the performance of a test for a particular individual. This determines a subject’s specific sensitivity curve, such as one of the green curves in Figure 1, which capture the probability that specimens from a particular subject will produce a positive diagnostic result. Because assay results are themselves not perfectly reproducible even on the same individual, even these green curves do not transition steplike from zero to one, but rather have some more finite window of time over which they transition from close to zero to close to one.
To estimate individual infection times, then, one needs to obtain estimates of the median diagnostic delays (i.e. the purple curve in Figure 1) for all tests occurring in a data set, and then interpret each individual assay result as excluding segments of time during which infection was not possible, ultimately resulting in a final inferred interval of time during which infection likely occurred.
These calculations require that each individual has at least one negative test result and at least one positive test result. In the primitive case where there is precisely one of each, namely a negative result on a test with an expected diagnostic delay of at and a positive result on a test with an expected diagnostic delay of at , then the interval is simply from to . When there are multiple negative results on tests at each with a diagnostic delay , and/or multiple positive results on tests at each with a diagnostic delay , then each individual negative or positive test result provides a candidate earliest plausible and latest plausible date of infection. The most informative tests, then, are the ones that most narrow the ‘infection window’ (i.e. result in the latest start and earliest end of the window). In this case, the point of first ‘detectability’ refers to the time when the probability of infection being detected by an assay first exceeds 0.5.
These remaining plausible ‘infection windows’ are usually summarised as intervals, the midpoint of which is naturally considered a ‘point estimate’ of the date of infection. Figure 2 illustrates the way this method works, on a particular (hypothetical) individual. Given two negative test results on one date and two positive test results on a later date, a plausible infection window can be estimated using the diagnostic delays of the assays in question (, , and in the figure). Note that it is the most sensitive negative test and the least sensitive positive test that proves most informative – by excluding the greatest periods of time during which infection could not have occurred.
These infection intervals can be understood as plateaus on a very broadly plateaued (rather than ‘peaked’) likelihood function, as shown in Figure 3. Given a uniform prior, this can be interpreted as a Bayesian posterior, with in Figure 3 showing the 95% credibility interval (i.e. the interval encompassing 95% of the posterior probability density). Such a posterior, derived from an individual’s diagnostic testing history, could also serve as a prior for further analysis if there is an available quantitative biomarker for which there is a robustly calibrated maturation/growth curve model. We do not deal with this in the present work, but it is explored elsewhere (10), and is an important potential application of this framework and tool.
In Appendix A we derive a formal likelihood function – i.e. a formula capturing the probability of seeing a data element or set (in this case, the set of negative and positive test results), given hypothetical values of the parameter(s) of interest – here, the time of infection. This interpretation of individual test results relies on the assumption that test results are independent. Of course, the very factors that influence the individual (green) sensitivity curves in Figure 1 suggest that strong correlations between results of different tests on the same person are likely. Given this, we further demonstrate in Appendix A when and how test correlation might influence the analysis. While this method does not require a preset list of infection stages dependent upon defined assay combinations (as with Fiebig staging), it does require estimation of the diagnostic delay for each assay, either by sourcing direct estimates of the diagnostic delay, or by sourcing such data for a biochemically equivalent assay. Our online HIV infection dating tool, described below, is preloaded with diagnostic delay estimates for over 60 HIV assays, and users can both add new tests and provide alternative diagnostic delay estimates for those tests which are already included.
Implementation
The public online Infection Dating Tool is available at https://tools.incidenceestimation.org/idt/. The source code for the tool is available publicly under the GNU General Public License Version 3 open source licence at doi:10.5281/zenodo.1488117. The userfacing web interface is described in Appendix B (Grebe et al 2019 Appendix B.pdf).
In practice, the timing of infectious exposure is seldom known, even in intensive studies, and studies of diagnostic test performance therefore provide relative times of test conversion (1113). Diagnostic delay estimates are therefore anchored to a standard reference event – the first time that a highlysensitive viral load assay with a detection threshold of 1 RNA copy/mL of plasma would detect an infection. We call this the Date of Detectable Infection (DDI). The tool produces a point estimate of this date for each study subject, called the Estimated Date of Detectable Infection (EDDI). Details and evaluation of the performance of the diagnostic delay estimates underlying this tool compared with other methods for estimation of infection dates are available elsewhere (10).
The key features of our online tool for HIV infection date estimation are that:
 Users access the tool through a free website where they can register and maintain a profile which saves their work, making future calculations more efficient.
 Individual test dates and positive/negative results, i.e. individuallevel ‘testing histories’, can be uploaded in a single commadelimited text file for a group of study subjects.
 Estimates of the relative ‘diagnostic delay’ between the assays used and the reference viral load assay must be provided, with the option of using a curated database of test properties which provides cited estimates for over 60 HIV assays.
a) If a viral load assay’s detection threshold is known, this can be converted into a diagnostic delay estimate via the exponential growth curve model (2, 10). We assume that after the viral load reaches 1 RNA copy/mL, viral load increases exponentially during the initial rampup phase. The growth rate has been estimated at 0.35 log10 RNA copies/mL per day (i.e., a doubling time of slightly less than one day) (2). The growth rate parameter defaults to this value, but users can supply an alternative estimate.
 Using the date arithmetic described above, when there is at least one negative test result and at least one positive test result for a subject, the uploaded diagnostic history results in:
a) a point estimate for the date of first detectability of infection (the EDDI);
b) an earliest plausible and latest plausible date of detectable infection (EPDDI and
LPDDI); and
c) the number of days between the EPDDI and LPDDI (i.e., the size of the ‘DDI interval’), which gives the user a sense of the precision of the estimate.
Access / User profiles
Anyone can register as a user of the tool. The tool saves users’ data files as well as their choices about which diagnostic delay estimates to use for each assay, both of which are only accessible to the user who uploaded them. No personallyidentifying information is used or stored within the tool; hence, unless the subject identifiers being used to link diagnostic results can themselves be linked to people (which should be ruled out by preprocessing before upload) there is no sensitive information being stored on the system.
Uploading diagnostic testing histories
A single data file would be expected to contain a ‘batch’ of multiple subjects’ diagnostic testing histories. Conceptually, this is a table like the fictitious example in Table 1, which records that:
 one subject (Subject A) was seen on 10 January 2017, at which point he had a detectable vial load on an unspecified qualitative viral load assay, but a negative BioRad GeeniusTM HIV1/2 Supplemental Assay (Geenius) result
 another subject (Subject B) was screened negative using a pointofcare (PoC) rapid test (RT) on 13 September 2016, and then, on 4 February 2017, was confirmed positive by Geenius, having also tested positive that day on the PoC RT
Table 1
Subject

Date

Test

Result

Subject A

20170110

Qualitative VL

Positive

Subject A

20170110

Geenius

Negative

Subject B

20160913

POC RT

Negative

Subject B

20170204

POC RT

Positive

Subject B

20170204

Geenius

Positive

Sample data file for uploading diagnostic testing histories into the tool. Abbreviations: VL = viral load assay, Geenius = BioRad GeeniusTM HIV1/2 Supplemental Assay, POC = point of care, RT = rapid test
In order to facilitate automated processing, the tool demands a list of column names as the first row in any input file. While extraneous columns are allowed without producing an error, there must be columns named Subject, Date, Test and Result (not case sensitive). Data in the subject column is expected to be an arbitrary string that uniquely identifies each subject. Dates must be in the standard ISO format (YYYYMMDD).
It is fundamental to the simplicity of the algorithm that assay results be either ‘positive’ or ‘negative’. There are a small number of tests, notably Western blot and the Geenius, which sometimes produce ‘indeterminate’ results (partially, but not fully, developed band pattern). Note that there is some lack of standardisation on interpretation of the Western blot, with practice differing in the United States and Europe, for example. While we provide default values for common Western blot assays, users may enter appropriate estimates for the specific products and interpretations in use in their specific context.
We now briefly reconsider Table 1 by adding the minor twist that the Geenius on Subject B is reported as indeterminate. In this case, the data must be recorded as results on either one or both of two separate tests:
 a ‘TestIndeterminate’ version of the test – which notes whether a subject will be classified either as negative, or ‘at least’ as indeterminate; and
 a ‘TestFull’ version of the test, which determines whether a subject is fully positive or not.
There is then no longer any use for an unsuffixed version of the original test. The data from Table 1 is repeated in Table 2 with differences highlighted. The only changes are the use of the TestIndeterminate version for Subject A’s negative Geenius result and an indeterminate Geenius result for Subject B. Note that even while Subject A’s test results have not changed, their testing history now looks different, as completely negative results are reported as being negative even for the condition of being indeterminate. Subject B’s indeterminate result on 4 February requires two rows to record, one to report that the test result is not fully negative (positive on ‘Geenius Indeterminate’), and one to report that the result is not fully positive (negative on ‘Geenius Full’). Once diagnostic delays are provided for these two subtests, the calculation of infection dates can proceed without any further data manipulation on the part of the user.
Table 2
Subject

Date

Test

Result

Subject A

20170110

Qualitative VL

Positive

Subject A

20170110

Geenius Indeterminate

Negative

Subject B

20160913

POC RT

Negative

Subject B

20170204

POC RT

Positive

Subject B

20170204

Geenius Indeterminate

Positive

Subject B

20170204

Geenius Full

Negative

Sample data file for uploading diagnostic testing histories into the tool, with indeterminate results. Abbreviations: VL = viral load assay, Geenius = BioRad GeeniusTM HIV1/2 Supplemental Assay, Full = fully reactive, POC = point of care, RT = rapid test
Provision of test diagnostic delay estimates
As described above, tests are summarised by their diagnostic delays. The database supports multiple diagnostic delay estimates for any test, acknowledging that these estimates may be provisional and/or disputed. The basic details identifying a test (i.e. name, test type) are recorded in a ‘tests’ table, and the diagnostic delay estimates are entered as records in a ‘testproperties’ table, which then naturally allows multiple estimates by allowing multiple rows which ‘link’ to a single entry in the tests table. A test property entry captures the critical parameter of the ‘average’ (usually median) diagnostic delay obtained from experimental data and, when available, a measure of the variability of the diagnostic delay (denoted ).
The system’s user interface always ensures that for each user profile, there is exactly one test property estimate, chosen by the user, for infection dating calculations at any point in time. Users need to ‘map’ the codes occurring in their data files (i.e. the strings in the ‘Test’ column of uploaded data files) to the tests and diagnostic delay estimates in the database, with the option of adding entirely new tests to the database, which will only be visible to the user who uploaded them. The tool developers welcome additional test estimates submitted for inclusion in the systemdefault tests/estimates.
Execution of infection dating estimation
The command button ‘process’ becomes available when an uploaded testing history has no unmapped test codes. Pressing the button leads to values, per subject, for EPDDI, LPDDI, EDDI, and DDI interval, which can be previewed onscreen and downloaded as a commadelimited file.
By default, the system employs simply the ‘average’ diagnostic delay parameter, in effect placing the EPDDI and LPDDI bounds on the DDI interval where the underlying sensitivity curve evaluates to a probability of detection of 0.5. When the size of the intertest interval () is greater than about 20 times the diagnostic delay standard deviation (), this encompasses more than 95% of the posterior probability.
As an additional option, when values for both and are available, and users may specify a significance level (), at which point the system will calculate the bounds of a corresponding credibility interval. The bounds of the central 95% (in the case of ) of the posterior are labelled the EPDDI and LPDDI.
Database Schema
This tool makes use of a relational database, which records information in a set of linked tables, including:
 subjects: This table captures each unique study subject, and after infection date estimation has been performed, the subject’s EDDI, EPDDI, LPDDI and DDI interval size.
 diagnostic_test_history: This table records each test performed, by linking to the subjects table and recording a date, a ‘test code’, and a result. During the estimation procedure, a field containing an ‘adjusted date’ is populated, which records the candidate EPDDI (in the case of a negative result) or LPDDI (in the case of a positive result) after the relevant diagnostic delay has been applied to the actual test date.
 diagnostic_tests: This is a lookup table listing all known tests applicable to the current purposes (both systemprovided and userprovided).
 test_property_estimates: This table records diagnostic delay estimates (system and userprovided). It allows estimates per test, with system default estimates flagged.
 test_property_mapping: This table records userspecific mapping of test codes by linking each test code in the diagnostic_test_history table to a test in the diagnostic_tests table, as well as the specific test property estimate ‘in use’ by that user for the test in question.
A number of subsidiary tables also exist to manage users of the system and allow linking of personal data files, maps, tests, and test property estimates to specific users.