Human data
In NSW, treating pathology laboratories are required under public health legislation to report all Salmonella infections to the health department. We used deidentified data for 40,837 human cases notified between January 2008 and August 2019. The available case data included: typing information for the Salmonella isolates, patient gender, five-year age group, location, and travel history. Typing methods included serotyping, MLVA typing, and phage typing; however, not all typing methods were performed for every isolate. Location was encoded at the Statistical Area 2 (SA2) level as defined in the Australian Statistical Geography Standard. As we were primarily interested in cases acquired in NSW, we removed all cases recorded as being acquired outside NSW. However, travel history was known only for a small minority of cases, and before May 2010, cases that were known not to have travelled and cases with unknown travel history were recorded identically. Therefore, we also excluded all cases due to serotypes deemed to be travel-associated. A serotype was deemed to be travel-associated if more than half of cases after May 2010 with a known travel history were believed to be acquired outside NSW.
Non-human data
Data for non-human sources were collated from the National Enteric Pathogens Surveillance System (NEPSS) and the New South Wales Food Authority (NSWFA). The two datasets included non-human Salmonella isolates sampled from animals, animal products, farm environments (e.g. animal pens and barns), food, and the natural environment. Isolates considered as from potential animal sources were those derived from the faeces, carcass, or enclosures of an animal or from an animal product. Isolates that were not serotyped or were from a travel-associated serotype were excluded. We also excluded isolates without a recorded date or with a date before 2008.
Non-human animal isolates were categorised by reservoir animal source where possible from the recorded details. For example, all isolates from pig carcasses; pig faeces; cooked, cured, or raw pig meats; pig offal; pig intestinal contents; and pig farm environments (e.g. bedding) were categorised as porcine. Broiler and layer chickens were treated as separate reservoirs. Chicken meat products for human consumption and environmental samples from broiler farm premises and chicken meat processors were categorised as broiler, while isolates from eggs, egg products (e.g. mayonnaise) and egg farm premises were categorised as layer. Isolates that could not be definitively assigned to an animal reservoir were excluded, e.g. feather meal which might have been derived from broilers or layers, and fresh produce which might have been contaminated directly or indirectly by any wild or domestic animal. We excluded isolates from food products with multiple animal origins (e.g. ‘egg and bacon roll’) and isolates with ambiguous descriptions (e.g. ‘meat’). Where possible, putative sources were combined to form categories with at least 100 isolates. Putative sources that could not be reasonably combined into broader categories with at least 100 isolates were excluded.
As source attribution analysis requires the same typing scheme to applied to both source and human data, and serotyping was the most consistently and completely applied method, analyses were conducted using serotyped data, and isolates without serotyping information were removed from both the human and non-human datasets.
Source attribution modelling framework
We generalised existing Bayesian source attribution methods [19, 20] to estimate changing attribution proportions over time, include covariates for the cases, and adjust for differences between types.
The proportion ( \({\theta }_{ijst}\)) of cases in subpopulation \(s\) during period \(t\) that were due to type \(i\) from source \(j\) was modelled as:
$${\theta }_{ijst} \propto {a}_{jst}{w}_{j} {r}_{ijt} {q}_{i}$$
with constraints \({\sum }_{i,j}{\theta }_{ijst}=1\) and \({\sum }_{i}{r}_{ijt}=1\) and where \({a}_{jst}\) was the ability of source \(j\) during period \(t\) to act as a reservoir of infection for group \(s\), \({w}_{j}\) was a weight for the relative exposure of humans to contamination from source \(j\), \({r}_{ijt}\) was the relative prevalence of type \(i\) in source \(j\) during period \(t\), and \({q}_{i}\) was the relative ability of subtype \(i\) to lead to human infection. In each group \(s\) and period \(t\) the proportion of infections of all types attributed to a source, \({\xi }_{jst}\), was:
$${\xi }_{jst}= {\sum }_{i}{\theta }_{ijst}\propto {a}_{jst} {w}_{j}{\sum }_{i}{r}_{ijt} {q}_{i},$$
while the proportion of cases due to each type, \({\mu }_{ist}\), was:
$${\mu }_{ist}= {\sum }_{j}{\theta }_{ijst} \propto {q}_{i}{\sum }_{j}{a}_{jst} {w}_{j}{ r}_{ijt}.$$
The estimation of these parameters occurred in two steps. The distribution of types in each source and period (\({r}_{ijt}\)) were estimated first, and all other parameters were then estimated repeatedly with draws from the posterior distribution of \({r}_{ijt}.\) In the first step, the number of isolates of each type observed in each source (\({X}_{ijt}\)) were modelled with multinomial distributions, in one of two ways. For sources with many isolates in every period, the relative frequency of types were modelled independently for each period based only on the data collected in that period, i.e. \({X}_{\cdot jt} \tilde Multinomial\left({r}_{\cdot jt}\right)\). With a unit Dirichlet prior this resulted in a Dirichlet posterior distribution: \(p\left({r}_{\cdot jt}|X\right) \tilde Dirichlet\left(1+{X}_{\cdot jt}\right).\) For sources with too few samples in each period, the data across the whole study was used for every time period resulting in posterior estimates:\(p\left({r}_{\cdot jt}|X\right) \tilde Dirichlet\left(1+{\sum }_{\tau }{X}_{\cdot j\tau }\right).\)
In estimating the remaining parameters, the efficiency of each type (\({q}_{i}\)) and the exposure weights (\({w}_{j}\)) were assumed to remain constant over time but source efficiencies (\({a}_{jst}\)) were allowed to vary over time and by subgroup of cases. This was modelled as:
$${a}_{jst}=\text{exp}\left({\tau }_{tj }+{\sum }_{n}{{F}_{sn}\beta }_{nj}\right).$$
where \(F\) was a matrix defining a linear predictor based on categorical, ordinal, or continuous covariates for each subgroup (i.e. category of a covariate) \(s\) of the cases; \(\beta\) was a matrix of parameters for each source \(j\); and \(\tau\) defined temporal differences in source efficiency by time \(t\) and source \(j\). A reference group was assigned to each covariate, and the associated parameters fixed to 0, while the remaining parameters are given unit normal priors.
The number of human cases in subpopulation \(s\) during period \(t\) that were due to pathogen type \(i\), were modelled as independent multinomial variables for each period \(t\) and subgroup \(s\), i.e. \({Y}_{\cdot st} \tilde Multinomial\left({\mu }_{\cdot st}\right)\). The \({q}_{i}\) were constrained with a hierarchical log-normal prior:
\(p\left({q}_{i}|\sigma \right) \tilde logNormal\left(0,{\sigma }^{2}\right)\) \(p\left(\sigma \right) \tilde Uniform\left(0, 5\right).\)
For predominantly foodborne infections the exposure weights \({w}_{j}\), can be approximated by the relative level of exposure to contaminated food products derived from each source. However, as this is not measured directly, we modelled this as \({w}_{j}={M}_{j} {k}_{j}\), where \({M}_{j}\) was the per capita consumption of food derived from source \(j\), and \({k}_{j}\) was the prevalence of the pathogen in food derived from source \(j\), which was estimated from surveys of animal food products. For each source \(j\), we modelled the number of total tests (\({N}_{j}\)) and positive tests (\({P}_{j}\)) \({P}_{j} \tilde Binomial\left({N}_{j}, {k}_{j}\right),\) with an uninformative uniform prior on prevalence, i.e.\({p(k}_{j}) \tilde Beta\left(\text{1,1}\right).\)
Our model framework was extended to include an ‘unsampled source’ by including an additional source \({j}^{*}\) with no observed samples, i.e. \({X}_{i{j}^{*}t}={P}_{{j}^{*}}={N}_{{j}^{*}}=0\).
Source attribution models
As a base case we considered a model with no covariates and no temporal variation, which was equivalent to the Modified Hald model applied to the whole study period [19]. As a sensitivity analysis we compared this to a model where all the type efficiency terms \({q}_{i}\) were fixed at one, which was equivalent to the Dirichlet model of Liao et al. [20] with the addition of exposure weights. We considered each model with and without an ‘unsampled source'.
We then considered models with combinations of covariates (age, rurality, and gender) with and without temporal variation. In models adjusting for age, we used age categories: 0–4, 5–19, 20–34, 35–64, and 65 and over. The rurality of each case was determined by matching the Australian Bureau of Statistics 2011 Statistical Area 2 (SA2) code provided for the case to one of the five rurality zones (Major Cities, Inner Regional, Outer Regional, Remote, and Very Remote) defined in the 2011 Australian Statistical Geography Remoteness Structure. However, as rurality was defined at the more granular SA1 level and each SA2 is built from SA1s, a few SA2s contained regions with different categories of rurality. Cases associated with these SA2s were assigned the rurality zone closest to the average rurality of the constituent SA1s. In our main analyses, rurality was modelled as an ordinal variable. In a sensitivity analysis we modelled rurality as a nominal categorical variable, combining the five categories down to three categories (‘Major Cities’, ‘Regional’ and ‘Remote’) to increase the number of cases in each category. The small number of cases that had a missing value for gender, age, or location were excluded only from analyses involving the missing covariate.
In models with temporal variation, we considered four three-year periods: 2008–2010, 2011–2013, 2014–2016, and 2017–2019. As there were > 200 isolates per period from broilers and > 450 isolates per period from layers, the relative frequency of serotypes was estimated independently for each period for broilers and layers. As there were relatively few isolates for ruminants and pigs in at least one period (< 30 for pigs and < 20 for ruminants), data across all study years was used to estimate the relative frequency of serotypes in ruminants and pigs in every period.
To improve model convergence, we excluded serotypes that rarely caused disease in humans (i.e. <10 human cases across the study period) from the analysis.
Exposure by source
Table 1 provides a summary of prevalence assumptions with references, while Table 2 summarises assumptions on food consumption per capita. The datasets used to determine the number of samples positive for each pathogen type \({(X}_{ijt})\) did not include data on the number of total tests, so the prevalence of Salmonella in food products was estimated from separate surveys, with an adjustment to reduce the number of positive samples by the fraction of cases from excluded serotypes from each source. As age, sex, location, and time-specific consumption and prevalence data were not available for all food sources, exposure weights and prevalence on animal products were assumed fixed across all time periods and the whole population. We used published national data on Salmonella from the E. coli and Salmonella monitoring (ESAM) program from the early 2000s to inform prevalence in ruminants and pork [22, 23], as publicly available contemporary surveys in NSW had insufficient samples to determine prevalence. The relative exposure of the human population to ruminants, pigs, and broilers were based on the apparent consumption of meats from these sources published by the Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES) [24]. For eggs, we assumed consumption of one egg was equivalent to 200g of meat (as in our previous work [11]) and used consumption estimates from a 2018 Australian Eggs annual report [25]. In models with an ‘unsampled’ source we adopted the conservative assumption that exposure to the unsampled source was equal to the source or group of sources with the highest consumption.
Table 1
Salmonella prevalence assumptions by source with references.
|
Prevalence (95% CI)
|
Prevalence adjusted for types rare in cases (95% CI)
|
Reference
|
Chicken meat
|
48.4% (42.0-54.8)
|
38.6% (32.5–45.0)
|
NSW-specific data in Table 4 [49]
|
Chicken eggs
|
1.76% (0.70–3.59)
|
1.44% (0.66–2.72)
|
Following prior assumptions [11]
|
Pigs
|
1.88% (1.57–2.22)
|
1.14% (0.90–1.41)
|
National ESAM† data [22]
|
Ruminants
|
0.38% (0.33–0.43)
|
0.37% (0.32–0.42)
|
National ESAM† data [23]
|
† E. coli and Salmonella monitoring program |
Table 2
Relative exposure to potential sources of Salmonella, measured by mean consumption of meat and animal products per person per year in Australia.
|
Relative Exposure (kg/person/year
or equivalent)
|
Reference
|
Chicken meat
|
47
|
ABARES† [24]
|
Chicken eggs
|
49‡
|
Australian eggs [25]
|
Pork
|
28
|
ABARES† [24]
|
Ruminants
|
34
|
ABARES† [24]
|
†Australian Bureau of Agricultural and Resource Economics and Sciences |
‡Mean consumption was approximately 245 eggs per person per year in the 2017-18 financial year. The relative exposure was calculated assuming that one egg is equivalent to 200g of meat. |
Software
All analyses were conducted in the R programming environment [26]. Bayesian inference was performed using the No U-turn Algorithm with the Stan programming language [27] and the rstan R package [28]. Data cleaning and manipulation was done using the plyr [29], dplyr [30], and tidyr [31] R packages. Data visualisations were made with the ggplot2 R package [32].