The sheer volume of available data makes it challenging for users to sift through and find the information they truly need. Recommender systems play a pivotal role in addressing this challenge by leveraging users' historical interaction data to gain insights into their interests and preferences. Existing research on video recommendation primarily focuses on several key aspects: data representation, encompassing the feature representation of videos and user models; challenges posed by traditional recommendation methods when applied to video recommendation, such as cold start, sparsity, and overspecialization. The foundation of video recommendation lies in effectively describing and representing both videos and users, ultimately shaping the performance of the recommender system. Addressing these challenges, this paper introduces a recommendation model based on a multi-graph neural network. Leveraging the capabilities of graph neural networks to efficiently mine deep information from graph data, the model transforms input user rating information and item side information into multiple graphs for effective feature extraction. Taking a multimodal semantics approach, the paper proposes a video similarity metric learning method based on semantic information. Given that videos comprise different media types, each with distinct features, an analysis of these features and their representation in a unified manner establishes the groundwork for video recommendations. To enhance the recommendation model, an attention mechanism is introduced, leading to the development of a deep collaborative filtering recommendation model based on attention. Extensive experiments conducted on real datasets validate the effectiveness of the proposed model in providing recommendations.