Pain intensity serves as an essential indicator of patient discomfort. The analysis of video monitoring for the evaluation of pain intensity has emerged as a significant area of research. Convolutional neural network (CNN)-based spatiotemporal models are extensively utilized to extract dynamic pain information from videos for assessing pain intensity. Nonetheless, challenges persist in effectively extracting spatiotemporal dynamic information, because pain typically appears in specific consecutive frames and within certain regions. Consequently, this research introduces the attention-aware spatiotemporal network (AASTNet) designed for the evaluation of pain intensity, incorporating both temporal and spatial sub-networks. Specifically, the system employs a temporal sub-network, succeeded by a feature gating mechanism, similar to a self-attention mechanism, to selectively focus on video frames within a temporal sequence that contain pertinent pain-related information. Furthermore, an attention mechanism is integrated into the spatial sub-network to emphasize the spatially localized regions closely related to pain. To counteract the issue of spatially global information deficiency induced by the attention mechanism, this method integrates the geometric information of facial landmarks to capture the overall topological patterns of the facial regions associated with pain. The feature vectors derived from the two sub-networks are subsequently combined and utilized for regression analysis to estimate pain intensity. The proposed AASTNet is assessed on UNBC-McMaster Shoulder Pain Expression Archive Database, and the results reveal that the proposed AASTNet outperforms the contemporary state-ofthe-art methods.