Due to the continuous development of information technology, emerging technologies such as 5G networks, Turing devices and deep learning are widely used in people's lives. At present, the sports and tourism industry has become a pillar industry supporting the national economy. Sports events have attracted many people's attention, and people are willing to devote themselves to sports activities. Participants and visitors are also increasing day by day. The article also conducts research on the rapid detection of human video behavior, obtains more representative features for this demand. This research is of great help to the development of the industry. This allows the network to manipulate the input video behavior at different scales and extract deeper video information. Therefore, this paper proposes another 3D convolutional neural network model that combines a multi-level pyramid network with an attention mechanism. The visual attributes in the video behavior data set are explicitly learned to refine the classification of similar features. This paper also proposes an integrated neural network based on visual attribute enhancement. It also shows good adaptability, and can complete video data processing based on time series. The extraction of time and space functions is realized by the two-stream neural network method,the two networks learn the corresponding pixel ratios, and it is not easy to lose the clues of the action recognition. This paper also proposes a spatio-temporal convolutional neural network video for human body recognition algorithm, which is applied to the leisure tourism industry according to the specific analysis of the above method, and promotes the development of the industry.