Vertical heterostructures (HS) of transition metal dichalcogenides (TMDs) host interlayer excitons (ILX), with electrons and holes residing in different layers. With respect to their intralayer counterparts, ILX feature much longer lifetimes and diffusion lengths, paving the way to excitonic optoelectronic devices operating at room temperature. While the recombination dynamics of ILX has been intensively studied, the formation process and its underlying physical mechanisms are still largely unexplored. Here we use ultrafast transient absorption spectroscopy with a white-light probe, spanning both intralayer and interlayer exciton resonances, to simultaneously capture and time-resolve interlayer charge transfer and ILX formation dynamics in a MoSe2/WSe2 HS. We find that the ILX formation timescale is nearly an order of magnitude (∼1 ps) longer than the interlayer charge transfer time (∼100 fs). Microscopic calculations attribute the relative delay to an interplay between phonon-assisted interlayer exciton cascade and subsequent cooling processes, and excitonic wave-function overlap. Our results provide an explanation to the efficient photocurrent generation observed in optoelectronic devices based on TMD HS, as the ILX have an opportunity to dissociate during their thermalization process.