The design of the study is presented in Fig. 8. Initially, we developed a dynamic mathematical model describing the occurrence and the change in the occurrence of ARG of interest in the population over time. This model describes the "true" occurrence of ARG of interest at any given time.
Secondly, a model mimicking the surveillance program overlaid the model of the "true" occurrence to simulate the results that can be observed in surveillance given the settings of the sampling schedule (frequency of sampling, sample size, and pooling) and the metagenomic procedure (sequencing depth). In each iteration, by taking into account the stochasticity of sampling and metagenomics, a time series of observed data was generated. Each time series was subsequently analysed statistically to identify time points for detection of the start of changes in the occurrence as well as estimating long-term trends. In addition, given the time series, the future occurrence of the ARG of interest in the population was forecasted using forecasting methodology. The results from the statistical analysis were finally compared with the known truth that was given in the dynamic model describing occurrence and the change in the occurrence of an arbitrary ARG in the population over time.
Mathematical notification of model describing "true" occurrence and the change in the occurrence of an arbitrary ARG in the population over time.
True occurrence of an ARG in this study was defined as the occurrence of ARG at the farm (proportion of farms positive), within farm prevalence (proportion of pigs within a positive farm with the ARG), and; the concentration of the ARG in faeces of pigs carrying the gene. The concentration of the ARG(s) of interest in faeces was obtained from a Weibull distribution fitted27–29 to real-world data of the concentration of ARGs in pig faeces30,31. We chose the Weibull distribution because this gave the best fit to the data as compared to gamma and lognormal distributions (for details, see Supplementary Fig. 1).
In this study, three different scenarios of true occurrence were modelled, each representing ARGs with different occurrences and changes: 1) an increased occurrence of an ARG that is already widely present in all farms in all pigs (high-level endemic), 2) an increased occurrence of an ARG that is present but at a lower frequency (low-level endemic) and 3) the de novo emergence of a novel ARG (emerging). In all scenarios, we use metagenomic data obtained from samples collected in a previous study conducted on the Danish pig population15. In scenario 1, we use metagenomic data for tet(M) and in scenario 2 we used metagenomic data for \(\beta\)-lactam. These two genes are widely spread in the Danish pig population, and therefore both in scenarios 1 and 2, the prevalence of positive farms and animals within farms was set at 100%. Within the first 120 months, we assumed that there was no increase in occurrence. After 120 months, we introduced a 5% annual increase in concentration.
In scenario 3, in the model, a new ARG was introduced into one farm in a population of 4,000 farms, where after the ARG was allowed to spread to other farms over time (emerging). The spread was following a generalised Susceptible-Infected (\(SI\)) compartmental model32 (for details, see Supplementary Eq. 1). Our model was based on a homogeneously mixing system, with a transmission rate (0.148) resulting in 85% contaminated farms after five years. The spread described in the \(\left(SI\right)\) model only considers the spread of ARG between farms. As soon as an ARG is introduced into a farm, it was assumed that the occurrence of pigs carrying the ARG was the same as the occurrence in a farm that has been infected for a longer period (90%). We assumed that the within-pig concentration of the emerging ARG was the same as the concentration of the blaOXA gene (a resistance gene with an extremely low concentration in the Danish pig population). The number of samples per pooled that was simulated was 5, 20, 50, 100, and 200.
Model for simulating results obtained in the surveillance program
In the simulation, we generated results obtained in a hypothetical monitoring program, including the effect of the design of the monitoring - the frequency of sampling and the numbers of samples at each timepoint of sampling. In this study, we assumed that all samples collected at each timepoint were pooled into one single technical sample before sequencing. Depending on the number of DNA fragments that are sequenced (sequencing depth)33–35 and the concentration of the ARG of interest36, the initial output from the model is the number of DNA fragments that were assigned to the ARG of interest.
Subsequently, we calculated the Counts of ARG divided by the total millions number of fragments sequenced in the sample - Counts Per Million (#CPM)37,38. Thereby, the outcome in the surveillance of AMR using metagenomics is the #CPM in the population. The #CPM in each faecal sample is the product of whether the sample is collected from a farm having the ARG (0 or 1) \(binomial(n=1, prob=farmprev)\), the collected animal within a positive farm carrying the ARG (0 or 1) \(binomial(n=1, prob=farmprev)\) and the concentration of the ARG if the animal is carrying the ARG (random draw from the Weibull distribution). The model includes a pooling procedure where the number of collected faecal samples at each timepoint of sampling (assumed to be 5, 20, 50, 100, or 200) are pooled into one technical sample before DNA sequencing. The mean of the CPM in the pooled faecal samples represents the "concentration of the ARG in the technical sample at a given point in time." The simulation was conducted over 240 months, with a sampling frequency of once per month.
In theory, when the design factors frequency of sampling, sample size, or sequencing depth goes infinite, the sensitivity of the monitoring program will become 100%, and the lag time between change of occurrence and detection of change will be zero. To obtain realistic values of the design factors, we developed an equation wherein the price of sampling and sequencing were integrated together with the variables sample size and sequencing depth, summing up to a given total annual cost of a monitoring program39. We developed a cost function (Eq. 3) to estimate how deep we would be able to sequence each pooled technical faecal sample given a sampling of n faecal samples (n = 5, 20, 50, 100, and 200) randomly collected once a month. The cost we calculated from
$$Total price=lab cost* k+lab technician cost+n*price per sample+m*price per fragment \left(3\right)$$
where \(k\) is the number of technical samples processed in the lab per month (in this study \(k=1\)), \(n\) and \(m\) are the number of samples in pooling and the sequencing depth, respectively (for details, see Supplementary Table 1).
An important determinant of the likelihood of metagenomics in detecting an emerging ARG is how many gene fragments are sequenced (sequencing depth)40–42. Which fragments that are sequenced in a sample are totally random, whereas the number of sequenced fragments can be pre-defined within a certain range. The mean sequencing depth for each sample pool is determined using \(rtruncnorm(n, a, b, mean , sd )\): where \(n\) is the number of observations (months); \(a\) is lower mean sequencing depth; \(b\)is upper mean sequencing depth; \(mean\) is mean sequencing depth, and \(sd\)is \((mean-a)/3\). To add randomness to the sequencing depth, we assume that the depth is varying stochastically between \(\pm\)20% of the mean, following a normal distribution where the range between − 20% and +20 % is equal to 6SD. During simulation, the sequencing depth as truncated at \(\pm\)3SD43. In the simulation of all scenarios and sub-scenarios, 1,001 iterations were performed using Monte Carlo simulation 44−46, representing 1,001 time series of the observations.
Statistical analyses of the time series obtained in the simulations
The performance of the surveillance program47 for detecting ongoing changes in the occurrence of ARGs in the population was done by initially identifying a change in the observed data using breakpoint analysis48,49 and subsequently calculating the time between the start of the change given by the mathematical function defining the true occurrence of an ARG in the population, and the time point when this change was identified in the time-series of observed data.
Also, by applying the breakpoint analysis throughout the time series of observed data, including the initial period of 10 years with no true change, we also obtained the number of false alarms, which is when the surveillance indicates a change in the occurrence of #CPM where the occurrence is actually not changing.
Breakpoint analysis - DBEST method
The simulated observed data of an ARG in a population using metagenomics were analysed for the presence of changes in occurrence over time using breakpoint analysis. In reality, when the amount of endemic occurrence AMR starts to change, the increase/decrease takes place gradually over months/years and not with a direct shift from one level to another within a few months. In the R package DBEST, statistical methods are applied to analyse time trends and detect changes over time. DBEST was originally developed to both detect and forecast changes in vegetation using remote information50. We chose DBEST because of its ability to handle #CPM in the form of continuous data, which is not possible with surveillance package analysis, which only considers whole numbers.
In our study, we used DBEST for time-trend analysis, looking for changes in the #CPM of the ARG(s) of interest over time (Supplementary Fig. 2). DBEST detects and estimates trend changes in two ways (abrupt and non-abrupt) and the timing, magnitude, number, and direction of those changes. In the DBEST method, there are five parameters that must be defined before the analysis. The description of the "change_magnitude" is the size of changes in the data that indicate a potential true change. In this study, we used a 5% increase/decrease (change_magnitude, Table 1) from the #CPM mean of the AMR-gene of interest in the population before an increase occurred. The specified "duration" is the time period after the breakpoint that the change must be present to indicate a true persistent change, given the algorithm's sensitivity to detect changes. This was defined as six months in our study (Table 1).
In accordance with our knowledge of changes in ARGs in animal populations over time and how it was specified in the model, the statistical methods should identify non-abrupt changes over a longer time period. Abrupt changes in a short time interval are not expected, and therefore, the parameters in the algorithm were set, so it only looks for non-abrupt changes. Therefore, the values for "first-level-shift" and "second-level-shift" were set to extremely high values, whereby none of the detected change points were classified as abrupt. This is a technical issue, but it is worth noting. The "alpha" is the statistical significance level value used to test the significance of detected changes. Table 1 shows how these five parameters were implemented in the DBEST.
Table 1
The parameters used for the DBEST method
tet(M) and \(\beta\)-lactam
|
Parameter
|
Value
|
First-level-shift
|
*Reference_Mean
|
Second-level-shift
|
*Reference_Mean
|
Duration
|
6 months
|
Alpha
|
0.05
|
Change_magnitude
|
5% of *Reference_Mean
|
*Reference_Mean is the mean of the AMR-gene of interest in the population before an increase takes place. |
Estimation of the time-trend
We used a rolling average to calculate the average number using CPM data. We calculated the first six-month rolling average. To transform the CPM data set for sequence data points, we specified the window size and took a new average value. This was applied to determine a new CPM average value (Fig. 1, red line). This was done to remove stochasticity from the CPM and gain a solid understanding of how to recognize past data trends and predict what will happen in the future. We used the R software (version 4.0.2) for modelling and data analysis.
Contributors
OOA, FMA, and HV conceived the study. OOA and HV designed the study. All authors acquired and analysed the data. OOA, FMA and HV interpreted the findings. OOA wrote the first draft of the manuscript. FMA and HV contributed to the writing and subsequent versions of the manuscript. OOA developed the code. All authors critically reviewed this paper and approved the final version. The corresponding author (OOA) had final responsibility for the decision to submit for publication.