Machine learning is a powerful tool to design accurate, highly non-local, exchange-correlation functionals for density functional theory. So far, most of those machine learned functionals are trained for systems with an integer number of particles. As such, they are unable to reproduce some crucial and fundamental aspects, such as the explicit dependency of the functionals on the particle number or the infamous derivative discontinuity at integer particle numbers. Here we propose a solution to these problems by training a neural network as the universal functional of density-functional theory that (i) depends explicitly on the number of particles with a piece-wise linearity between the integer numbers and (ii) reproduces the derivative discontinuity of the exchange-correlation energy. This is achieved by using an ensemble formalism, a training set containing fractional densities, and an explicitly discontinuous formulation.