Forecasting the evolution of contagion dynamics is still an open problem to which mechanistic models only offer a partial answer. To remain mathematically and/or computationally tractable, these models must rely on simplifying assumptions, thereby limiting the quantitative accuracy of their predictions and the complexity of the dynamics they can model. Here, we propose a complementary approach based on deep learning where the effective local mechanisms governing a dynamic are learned automatically from time series data. Our graph neural network architecture makes very few assumptions about the dynamics, and we demonstrate its accuracy using stochastic contagion dynamics of increasing complexity on static and temporal networks. By allowing simulations on arbitrary network structures, our approach makes it possible to explore the properties of the learned dynamics beyond the training data. Our results demonstrate how deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks.