Humans can produce complex movements when interacting with their surroundings. This relies on the planning of various movements and subsequent execution. In this paper, we investigated this fundamental aspect of motor control in the setting of autonomous robotic operations. We consider hierarchical generative modelling—for autonomous task completion—that mimics the deep temporal architecture of human motor control. Here, temporal depth refers to the nested time scales at which successive levels of a forward or generative model unfold: for example, the apprehension and delivery of an object requires both a global plan that contextualises the fast coordination of multiple local limb movements. This separation of temporal scales can also be motivated from a robotics and control perspective. Specifically, to ensure versatile sensorimotor control, it is necessary to hierarchically structure high-level planning and low-level motor control of individual limbs. We use numerical experiments to establish the efficacy of this formulation and demonstrate how a humanoid robot can autonomously solve a complex task requiring locomotion, manipulation, and grasping, using a hierarchical generative model. In particular, the humanoid robot can retrieve and deliver a box, open and walk through a door to reach the final destination. Our approach, and experiments, illustrate the effectiveness of using human-inspired motor control algorithms, which provide a scalable hierarchical architecture for autonomous performance of complex goal-directed tasks.