Lacking of gesture datasets for the scenarios with multi-person, gesture detection for multi-person is generally used two-stage schemes, thus leading to high computation and time cost. In this paper, we propose a method to automatically create a multi-person gesture dataset, which can quickly generate gestures with annotation information without manual annotation in complex scenarios. Afterwards, we treat different gestures as different categories and use a one-stage approach to detect and classify all the gestures in multi-person scenarios in a single run. Among the popular object detection backbones, the one that performs best on our dataset is pruned for the purpose of computational efficiency, knowledge distillation is further performed on the pruned model to mitigate the accuracy decline caused by pruning, and focal loss function is optimized for improving model performance. Experiments show that our approach significantly reduces the processing time for gesture detection in multi-person scenarios, while just slightly sacrificing the prediction accuracy. In a multi-person scenario with ten hands, the inference time of our algorithm is only 12 ms, outperforming the off-the-shelf methods by at least 4.3 times.