We introduce Kraken, a highly flexible heterogeneous system-on-chip (SoC) fabricated in 22 FDX technology demonstrating leading-edge energy efficiency and computational capabilities for ultra-low power extreme-edge applications. Inspired by the multi-modal perception system of mammalians, Kraken supports multi-modal sensing and perception via event-based and sampling-based sensor interfaces coupled with a spiking neural network accelerator (SNE) and a ternary neural network accelerator (CUTIE), respectively. An octa-core 32-bit RISC-V cluster with ML and DSP extensions enables sensor fusion and flexible parallel data processing. A fabric controller (FC) featuring a dedicated core manages all compute subsystems and low computation intensity tasks. We analyzed Kraken’s performance and efficiency: SNE consumes 2.6 pJ/SOp and 470 μJ/inference for eye gaze prediction with 64% accuracy. CUTIE achieves 590 TOp/s/W and 2.72 μJ/inference for foveated object classification with 88% accuracy. The RISC-V cluster consumes 0.45 pJ per 8-bit MAC operation, and 12.01 μJ/inference for eye tracking with a mean average error of 0.24. All of Kraken’s subsystems improve upon the state-of-the-art (SOA), while the heterogenous architecture offers extreme flexibility in supporting ultra-low power multi-modal AI fusion applications.