Deep neural networks can be trained in various domains using increasingly large batch sizes without compromising the efficiency. However, this massive data parallelism may differ from domain to domain. It is computationally challenging to train large deep neural networks on large datasets. In response, there has been a surge of interest in utilizing large batch size values during the optimization process, as it enables faster training of these networks, thereby facilitating distributed processing. However, this approach also presents a well-known problem called the "generalization gap," which can result in a degraded performance across multiple datasets. Currently, there is limited understanding of how to determine the optimal batch size.
To address this issue, we propose an adaptive tuning algorithm that dynamically adjusts the batch size. Our algorithm consists of four stages: gradient warm-up, loss derivation, calculation of weighted loss using historical batch size data, and batch size updating. We demonstrate the superior performance of our algorithm compared with the traditional constant-batch size approach by comparing it with multiple system-call datasets of varying sizes.