The well-established Bath's law states that the average magnitude difference between a mainshock and its strongest aftershock is roughly 1.2, independently of the size of the mainshock. The main challenge in calculating this value is the bias introduced by missing data points when the strongest aftershock is below the observed cut-off magnitude. Ignoring missing values leads to a systematic error, because the data points removed are those with particularly large magnitude differences $\Delta M$. The error is minimized, if we restrict the statistics to mainshocks at least two magnitude units above the cut-off, but then the sample size is strongly reduced. This work provides an innovative approach for modelling $\Delta M$ by adapting methods for time-to-event data, which often suffers from incomplete observation (censoring). In doing so, we adequately account for unobserved values and estimate a fully parametric distribution of the magnitude differences $\Delta M$ for $M>6$ mainshocks. Results show that magnitude differences are best modeled by the Gompertz distribution, and that larger $\Delta M$ are expected at increasing depths and higher heat flows. A simulation experiment suggests that $\Delta M$ is mainly driven by the number and the magnitude distribution of aftershocks. Therefore, in a second study, we modelled the variation of aftershock productivity in a stochastically declustered local catalog for New Zealand, using a generalized additive model approach. Results confirm that aftershock counts can be better modelled by a Negative Binomial than a Poisson distribution. Interestingly, there is indication that triggered earthquakes trigger themselves two to three times more aftershocks than comparable background events. This effect can either be an indicator for incorrect trigger pair assignments as a result of the declustering approach, or it may represent an actual effect due to a higher prevalent energy level in the tectonic system during on-going earthquake sequences. If confirmed, this effect must be carefully considered in forward simulations of earthquake sequences, as otherwise there is a risk of substantially underestimating cluster sizes and consequently overestimating $\Delta M$.