Our first question we wanted to answer with the case studies was whether the standard design employed in the 36 trials was appropriate, i.e. providing for a high chance to generate the data needed for a successful kinetic evaluation: How often do we get at least one model with an acceptable fit ?
In our evaluation we have not used an exclusion criterion, rather aimed to “punish” bad fits by the multiplication of Chi2 x fit score x residuals score which results in the fit quality parameter FQ. Since the best possible score for fit and residuals is 1, the best possible FQ-value equals the Chi2-result. Following FOCUS (2014) guidance, a threshold of 15% for Chi2 is used for laboratory trials, and 25% would still be good enough for field trials. Thus, FQ values of ≤ 25 would indicate overall good fit of the model to the data. FQ values of up to 100 would result for trials with acceptable visual scores (2) for the fit and the residuals, combined with a Chi2-value of up to 25%.
Hence, a possible target for a standard design could be to achieve FQ ≤ 100. In our data set of the 36 trials, all FQ values for the best fit were below 100 except for trial 18-2954-02 with FQ = 104 and thus very slightly above this target (Table 1). That means, our target was achieved in 97% of all 36 evaluated trials, indicating that the sampling scheme of days 0, 1, 2, 3, 5, 7 and 10 may be a suitable balance of measurement efforts and the reliability of the outcome.
However, the comparison with the results with the truncated datasets (SFO3: only sampling data retained for days 0, 3 and 10) indicated that even such a limited design would often provide acceptable estimates for the residue dissipation rates (Table 6). Compared to the 21-d RES from the best fit model for each trial, the 21-d RES for the SFO3 evaluations were on average within +/- 10%, except for one compound (fluopyram) where the mean difference was 16% (and that on the side of an underestimation of the dissipation rate, i.e. not more critical and thus of no regulatory concern). With only 3 data points, the uncertainty of the fit is potentially higher than with more data points. For our data set, the number of trials where 21-d RES for SFO3 is < 95% of the 21-d RES of the best fit is 12 of 36 (33%). This is comparable to the number of trials (14) where 21-d RES for SFO7 is < 95% of the best fit. However, the variation of the mean 21-d RES for SFO3 (CoV 30.6%) is slightly higher than the variation with the best fit 21-d RES (CoV 24.6%).
EFSA (2019) had stated that the number of samplings should never be < 4 for an acceptable residue dissipation trial. Our assessment here would indicate however that the error from accepting SFO evaluation of trials with only 3 samplings is often small, at least if the 3 sampling dates are suitably arranged. Taking into account that a typical foliage residue SFO-DT50 was found at about 3 days (Ebeling & Wang 2017), a good coverage of the dissipation can be expected with samplings on day 0 (100%), day 3 (50 %) and day 10 (ca. 10%), and that is probably the reason why the 21-d RES results with SFO3 are so similar to the 21-d RES from the respective best-fit model with the complete data (all 7 samplings) (Table 6). Another reason stems from the fact that 21-d RES is an integrated quantity which smooths deviations of single measuring points and is inherently more robust. Thus, regulatory acceptance of data-sets with a low but well-timed number of samplings could be considered case-by-case, for instance when the initially measured residues have declined to about 10-20% until the last sampling and the resulting DT50 is no obvious outlier.
On the other hand, our data evaluation suggests that using a surrogate DT50 from dividing the FOMC-DT90 by 3.32 (as mentioned in EFSA 2019) is often a significant and unnecessary underestimation of the dissipation rate as determined in the respective best fit model (Table 5). This surrogate DT50 can also be calculated from the DT90 in DFOP or HS fits that are generated with KinGUII, if their fit is better than FOMC (which is often the case in our data set), and this would be appropriate where a tool like TREC is not available or in use. However, when there is anyway already an FOMC model fitted to the data (as would be needed to determine an FOMC-DT90), then it would appear simple enough and appropriate to directly involve the real FOMC parameter alpha and beta in a tool like TREC instead of a surrogate SFO-DT50.
Justification to include all 4 kinetic KinGUII models in the evaluation instead of limiting the assessment to SFO can also be deduced from the comparison of the 21-d RES with SFO to the 21-d RES with the respective best fit model (Table 4): in ca. 40% of all cases (14 of 36), the best fit 21-d RES with SFO is smaller than with the respective best fit (i.e., in these cases SFO underestimates the MAF and TWA). In general, the difference is not large (on average 7 % over all trials), but for a specific compound the difference can be larger (e.g., about 40% for spiroxamine), particularly if the SFO fits are visually bad or borderline and thus difficult to accept. For individual trials, the difference was over 50% (16-2958-01, 17-2950-02). Especially for these trials with bad or borderline SFO fits, bi-phasic modelling is the best way to use the information from these trials. Thus, inclusion of non-SFO kinetics allows more realistic and robust DT50 determinations, and can lead to more protective refined risk assessments.
In their literature review, Fantke & Juraske (2013) concluded that residue dissipation would generally be well described by single-first-order kinetics, which appears to contradict our findings. However, they could calculate non-SFO fits only for a part of the studies in their database (due to lack of detailed residue concentrations reported in many of the original publications), and apparently applied a criterion of factor 2 as threshold for differences between DT50s. In view of the large variability in the data set, this may be a reasonable approach, but in a regulatory context a factor of 2 may often be relevant for decision making. Furthermore the apparent DT50 in non-SFO kinetics can only inform about the time to dissipation of the first 50% and tells very little about the dissipation rates of the second half of the residues, so it is not surprising that they did not see the better fit of the non-SFO kinetics as in our case studies. However, also Fantke & Juraske (2013) acknowledged that there are residue decline data that are significantly better described with other than SFO kinetics, and suggest to check different kinetic models for their fit to the data in future experimental studies.
There may be also reasons for non-SFO behaviour from a theoretical point of view. We consider here the total plant residues which is the sum of residues on the plant surface and in different plant tissues. These residues may be subjected simultaneously to different dissipation processes depending on their residence. Residues in the plant may be redistributed, degraded, diluted by growth or recharged by uptake from the plant surface. Residues on the plant surface may dissipate by photolysis, volatilisation, wash-off or uptake into the plant. Assuming that the single processes could be described by SFO the multiple overlay of this processes with different individual rate constants can easily produce an overall non-SFO behaviour.
Thus, there are overall good reasons to expand the range of kinetic models over the standard SFO, and include at least the 3 additional models used for regulatory environmental exposure assessment (DFOP, FOMC, HS) also for evaluating residue decline trials in the ecotoxicology area, at least for foliage residues as the key element of the herbivorous bird and mammal assessment, which is typically the driver behind the need for more realistic exposure assessments. Appropriate tools for this purpose are now available with KinGUII and TREC, and the outcome of our case study evaluations confirms that this is possible without undue extra-efforts, and that it leads to more robust refinements in the risk assessment. We would therefore recommend considering these suggestions in the ongoing revision of the EFSA GD for birds and mammals.
Interestingly, all our 6 case study compounds are included in the residue data compilation and evaluation of Fantke et al (2014) who aimed to develop a model predicting half-lives in plants based on various parameters like structural similarities of physicochemical properties. They report modelled mean half-lives (and 95% confidence intervals) of 5.51 d (1.29−23.61) for fluopyram, 3.72 d (1.80−7.70) for trifloxystrobin, 10.20 d (5.70−18.25) for spiroxamine, 8.16 d (1.91−34.94) for fluopicolide, 7.67 d (6.46−9.11) for tebuconazole, and 3.50 d (3.17−3.87) for propineb. Except for spiroxamine the modelled DT50s with their confidence ranges are similar to our results, but for spiroxamine their modelled dissipation appears much slower than measured in our trials. This may be related to the pronounced non-SFO profile of spiroxamine in our case study trials, which might be less well captured in the model fits used by Fantke et al. 2014. Additionally, their modelled DT50 of spiroxamine is based also on trials with other plant material than the cereal foliage in our case studies, where the dissipation rate may be different. Fantke et al (2014) have also compared and discussed the range of factors behind the variability between compounds, crops and trials, and conclude that “there is more than one process contributing to overall dissipation from plants and that these processes go in a counter-direction”. For that discussion, the reader is directed to their paper.
In any case, it should be noted that the purpose of our evaluations was not to establish a regulatory DT50 for the 6 compounds but to evaluate a set of trials with a standard design. That meant that we had to exclude some additional other residue decline studies which are available but in unequal numbers of trials, and partly with deviating sampling schemes. Here we decided to work with a deliberately reduced but standard data set, whilst for a DT50 proposal for regulatory risk assessments all available data per compound should be considered. Therefore the results of our case studies are not to be used directly in regulatory assessments without taking also into account the results of the other residue decline trials which are not included here. However, we would not expect large changes of the conclusions if we had evaluated all currently available studies for our 6 case study compounds. Overall, the rapid dissipation rates which we found here in cereal foliage are in very good agreement with the findings by Ebeling and Wang (2018) for a variety of leafy plant matrices.
Based on these findings, short DT50 values may be typical for foliage residues in ground vegetation of relevance for exposure of herbivorous birds and mammals on treated fields. We used an application interval of 10 days which is in our experience quite typical for fungicides that need to be repeatedly applied to maintain the efficacy (actually, our short DT50s may explain why they need to be applied in relatively short intervals). Certainly, there are cases with more applications of fungicides than 2, but 2 is quite typical for the compounds which we assessed, and too frequent applications also sometimes pose problems with resistance development. Where there is the need to assess many applications with short intervals that fall into the 21-d time window, then the use of TREC is even more attractive because the work saved with that automatic calculation tool may be even more considerable. The 21-d time window for bird and mammal risk assessment is a convention with no explicit justification, however this duration appears to fit to the duration of key reproductive phases in the toxicity studies that generate the risk assessment endpoint, like the embryonic phase for the avian test species (Bobwhite quail, Mallard duck), or the gestation and lactation phases in the rat reproduction study which is typically used for wild mammal risk assessment.
We developed the approach of calculating fit quality scores as product of Chi2 x fit score x res score specifically for this article, and found it worked well and was reasonably effective: the visual fit helps to detect a biphasic nature, and the residual plot helps to assess the amount of scatter, but not only as average number (as in the Chi2 value) but also with regard to its location on the curve, the systematicity of the scatter, and the relative distance to the straight line (i.e. the “weight” of the scatter). For our exercise, we feel quite comfortable not to apply strict triggers for acceptability of a trial, rather to punish bad fits so that the best fit is (relatively) easy to identify. The decision if that best fit is good enough in a regulatory context depends on that context, e.g. the level of conservativeness required or the overall weight of evidences under consideration.
In our opinion the visual fit scores should be more important than other criteria like parameter uncertainty, unless you need significant temporal extrapolation (as for example in FOCUS groundwater assessment where comparatively low residues matter if they persist over long time).
Our focus is primarily on the use of foliar residue decline data in the risk assessment for herbivorous birds and mammals, where the time window is short, and residues declined below 10-20% of the peak are usually not of concern in these risk assessments. Furthermore, the vegetation on arable fields is regularly removed by harvest, mowing, plowing and other measures, so that long-term kinetics in foliage are of much lower relevance than long-term kinetics in soil.
Therefore we did not incorporate parameter uncertainty in our evaluation. Sources of prediction uncertainty like representation of variable environmental conditions would in our view be best addressed by a sufficient number of trials conducted under contrasting but relevant conditions.
The primary purpose of our paper is to explore how available new calculation tools could be used to evaluate plant residue dissipation kinetics in a regulatory context. We found that the additional application of non-SFO kinetics certainly increases the workload for the evaluation, but not very much when using KinGUII and TREC as calculation tools. Further research would be useful to better assess the extent to which non-SFO better fits foliage residue decline, but our limited explorations suggest that it may be a significant proportion. Therefore we would like to encourage the use of non-SFO kinetic models in the regulatory risk assessment for herbivorous birds and mammals, and to provide detailed related guidance for that in the ongoing revision of the EFSA GD (2009).