With the gradual development of Internet technology, information resources are growing at a high speed and the problem of information overload has emerged. It is difficult for users to get the information they need directly and quickly from the massive amount of information. Therefore, recommendation algorithms are usually used to solve the problem of information overload, and the research of various recommendation methods. Depending on the implementation idea, recommendation algorithms are mainly classified into three categories: content-based recommendation, collaborative filtering-based recommendation, and hybrid recommendation methods. Collaborative filtering is the most widely used recommendation algorithm, and its main idea is to discover the correlation between users based on their preferences for products, and to make recommendations based on the correlation. Although these methods improve the performance of the recommendation system, when the number of users and products increases, the recommendation system may face the problems of sparsity and cold start, and thus cannot achieve personalized recommendation. The purpose of this paper is to leverage the multi-modal information to address the problem. Specifically, we devise a novel simplified GCN-based model which incorporates the content information extracted from the visual, acoustic, and textual modalities with CF signal by propagating along the item-user bipartite graph. Finally, conducting extensive experiments on public datasets, we demonstrate that our proposed model outperforms several state-of-the-art baselines.