Balanced segmentation of CNNs for multi-TPU inference

doi:10.21203/rs.3.rs-3095752/v1

Download PDF

Research Article

Balanced segmentation of CNNs for multi-TPU inference

https://doi.org/10.21203/rs.3.rs-3095752/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In this paper, we propose different alternatives for CNN (Convolutional Neural Networks) segmentation, addressing inference processes on computing architectures composed by multiple Edge TPUs. Specifically, we compare the inference performance for a number of state-of-the-art CNN models taking as a reference inference times on one TPU and a compiler-based pipelined inference implementation as provided by the Google's Edge TPU compiler. Departing from a profiled-based segmentation strategy, we provide further refinements to balance the workload across multiple TPUs, leveraging their co-operative computing power, reducing work imbalance and alleviating the memory access bottleneck due to the limited amount of on-chip memory per TPU. The observed performance results compared with a single TPU yield super-linear speedups and accelerations up to 2.60x compared with the segmentation offered by the compiler targeting multiple TPUs.

Domain-specific architectures

Edge TPU

deep learning

model segmentation

model inference

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Balanced segmentation of CNNs for multi-TPU inference

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1