Network traffic classification is an important task for ensuring network security and managing resources. The existing solution strategies are based on predefined features extracted by experts, which leads to high uncertainty when applied to network traffic classification. At the same time, it is necessary to continuously update features, making it difficult to achieve model migration and application. In contrast, in this study, a novel deep learning-based method was proposed to accurately determine network traffic characteristics. First, the traffic data were transformed into image data with texture features. Then, based on the characteristics of the input data, we proposed a multitask classification model for malicious and encrypted traffic called the multilevel spatiotemporal feature fusion enhanced network traffic classification model (MLST-FENet). This model automatically learns the nonlinear relationship between input and output and is an end-to-end framework. Experiments showed that MLST-FENet achieves better detection and classification performance for malicious and encrypted traffic on the USTC-TFC2016 and ISCX VPN -NONVPN datasets and has strong generalization ability, so it can be used in many practical application scenarios, providing more valuable information for the field of network security.