Meta-relationship for course recommendation in MOOCs

Course recommendations are used to help students with different needs to choose courses. However, students’ needs are not always determined by their personal interests, they are also influenced by different curriculum settings, different teacher teams and other factors. Current course recommendation methods lack the consideration of complex relational semantic information that affects students’ needs, resulting in unsatisfied recommendation. To address this issue, we propose Meta-Relationship Course Recommendation (MRCRec) to enrich the expression of relational information. Focusing on complex semantic information of multi-entity relationship and entity association, we construct creatively the multi-entity relational self-symmetric meta-path (MSMP) and associative relational self-symmetric meta-graph (ASMG), which are referred as meta-relationship (MR). We also design an algorithm of meta-relationship correlation measure (MRCor) to obtain semantic correlational information. Then, we adopt the graph embedding to mine and fuse the latent representations of users and that of courses as user preference and course characteristic, respectively. Finally, we optimize matrix factorization to complete recommended task. Comprehensive experiments are conducted on the MOOCCube dataset and XuetangX dataset. The results show that MRCRec can effectively recommend courses for users.


Introduction
In recent years, massive open online courses (MOOCs) are gradually becoming popular among students [6]. Although the scale of MOOCs has grown and the number of students has increased, MOOCs still have some difficulties. (i) The courses with the same name in MOOCs may differ in content and focus, for example, courses called data structure focus either on basic concepts or on algorithm ideas. (ii) The needs of students can be influenced by different curriculum settings, different teaching forms, different teacher teams and other factors, which may affect the maintenance of students' interest. These issues can lead to inappropriate course selections and low course completion rate. Therefore, it is important to capture the interests of students, recommend appropriate courses for students to study continuously and efficiently, and make teachers pay attention to the level of online education.
One popular course recommendation is collaborative filtering-based recommendation, which mainly recommends courses based on the similarity of students [17] or the similarity of courses [18]. Although these methods are intuitive and highly explanatory, they lack the ability to deal with sparse vectors, resulting in hot effect obviously. Another kind is matrix factorization-based recommendation, which solves the shortage of sparse vector processing ability to a certain extent. This kind of methods obtains the implicit vectors of users and courses on the basis of co-occurrence matrix [2,3,5,24]. When recommending courses for a user, the implicit vectors are used for inner product operation to obtain the predicted score and the final recommendation list [20,21]. However, they are not convenient to add features of user, course and context relationships, so it loses the opportunity to use additional valid information.
Recently, heterogeneous information network (HIN) is introduced to address the absence of context information in the recommendation. Users, courses, other entities and their 1 3 relationships are modeled as heterogeneous data to bring effective information [10]. Usually, meta-path as the guidance to capture the heterogeneous information is applied to the recommendation system [1]. Graph embedding is used to obtain deep features of semantic information based on paths [8,19,23,25,26], and then extend to matrix factorization to carry out follow-up recommendation task [9,12]. However, these methods lack consideration of the relational semantic information that affects students' need as the feature of students' interests and courses' characteristics.
For solving the problems mentioned above, we propose Meta-Relationship Course Recommendation (MRCRec). The multi-entity relational self-symmetric meta-path (MSMP) and the associative relational self-symmetric meta-graph (ASMG), which are collectively referred as meta-relationship (MR), are constructed to employ the rich heterogeneous relationships among entities. Considering the self-characteristics of MR, we design an algorithm of meta-relationship correlation measure to get semantic correlational information for graph embedding. Then, we adopt the graph embedding to learn and fuse the latent representations of users and that of courses as user preference and item characteristic, respectively. Ultimately, we optimize matrix factorization to obtain the recommendation list. Then, we conduct comprehensive experimental studies using the MOOCCube dataset and XuetangX dataset to evaluate the performance of MRCRec. We analyze the impact of MR composed with different types of entities. We also study the parameters, including the number of latent factors, the value of dropout rate and the number of GCN layers. Comparing with a series of methods published recently, the effectiveness of MRCRec is comprehensively demonstrated. The main contributions of this paper can be summarized as: (i) MSMP with multiple relationships and ASMG with association are designed to capture more complicated semantic heterogeneous information. (ii) Based on the multi-entity relationship, association relationship and self-symmetry, an algorithm of meta-relationship correlation metric is designed to measure heterogeneous information.

Matrix factorization for recommendation
Matrix factorization (MF) is commonly used in recommendation methods. MF decomposes the user behavior matrix into two low-dimensional matrices, one is the implicit factor vector representing user preferences, and the other is the implicit factor vector representing item characteristics. Thai et al. [20] proposed to exploit multiple relationships between students and tasks using multi-relational MF methods. Kabbur et al. [5] presented an item-based method that conducts recommendations based on historical behaviors of the user and target items. Elbadrawy et al. [21] investigated the student and course academic feature to define student and course groups, and then used to neighborhood-based user collaborative filtering and matrix factorization. With the development of deep learning, researchers are exploring the use of neural network to extract deep feature for matrix factorization. He et al. [2] proposed a neural attentive itembased method based on attention mechanism to distinguish the weights of different learning behaviors. He et al. [3] applied a multi-layer perception in the user representation and item representation to learn the probability of recommending items. Xue et al. [24] introduced the deep itembased collaborative filtering to capture the deep feature of higher-order item relations and design attention modeling of second-order item relations. Such matrix factorization-based methods can deal with sparse vectors, but it is not convenient to add the features of context, which misses the opportunity to use additional effective information.

Meta-path-based graph embedding
In order to solve the problem of missing context information, heterogeneous information networks [8] are introduced, and meta-paths can be guided to capture the heterogeneous information for recommendation system [1]. The similarity information between the meta paths is calculated and then applied to the extended matrix decomposition for the subsequent recommendation task. Yu et al. [15] proposed a matrix factorization-based unified recommendation model using rating data and related meta-paths. Luo et al. [7] presented a social relation collaborative filtering recommendation algorithm using meta-paths. Shi et al. [11] presented a matrix factorization-based dual regularization framework using the simple meta-paths to get similarity of users and items. Zhao et al. [16] introduced the concept of meta-graph to heterogeneous information network, and then solved the information fusion with matrix factorization. In terms of course recommendation, Chen et al. [22] proposed a contentbased top-N recommender system by bringing in PathSim to measure the similarity between users and courses. These methods take advantage of the additional effective information. However, they may not fully mine latent feature of users and items.
To fully mine the deep feature of users and items, graph embedding is explored by researchers. These methods adopt networks embedding to obtain deep features of semantic information based on paths, and then extend to matrix factorization for recommendation. Shi et al. [8] designed a meta-path-based random walk strategy to embed heterogeneous information, and then integrated into an extended matrix factorization model. In the course recommendation, Zhu et al. [23] constructed graph structure with information of students, courses and students' rating, and used a random walk to get features. Piao [25] investigated two attention mechanisms for aggregating information from different meta-paths to predict and recommend concepts of interest to users. Gong et al. [19] constructed simple meta-paths and calculated similarity of students or courses to embed heterogeneous information. Velickovic et al. [26] presented the deep graph infomax approach for learning node representations within graph-structured data in an unsupervised manner. These methods mine the deep features of users and courses, but lack rich and complex relational semantic information that can affect the features of students' interests and courses' characteristics.
To address the above problems, we put forward the metarelationship course recommendation by combining MSMP and ASMG to solve the deficiency of rich multi-entity relationships and complex entity relationships in the extraction of deep features.

Meta-relationship for course recommendation
Given different types of entities and original behavioral relational data among entities, the goal is to calculate the preference scores about users and courses, and then recommend courses for users. In detail, given multiple types of behavioral relational data (e.g., users learn courses, users watch videos, teachers teaches courses, etc.), a predict function is learned to minimize the predictive error and generate a recommendation list of courses. The framework of MRCRec is shown in Fig. 1.

The construction of meta-relationship
To establish different types of entities and their rich relationships, we use a heterogeneous information network to depict user, course, other items and their corresponding relationships. Then we obtain the network schema [13] to mine the entity semantic relationships among a series of entities.
The network schema is denoted as T G = (E, R) , where E denotes the entity set and R denotes the relationship set. The corresponding schema of MOOCs is given in Fig. 2 as an example.

Definition 1 Multi-entity relational self-symmetric metapath.
MSMP is a path defined on network schema T G , consists of entity set E MP and relationship set R MP , and is denoted as The Framework of MRCRec. It uses behavioral relational data to construct MSMP and ASMG both for user and course, which compensates for rich and complex semantic information. Then MRCor measures the correlation of users and that of courses for graph embedding. Finally, the deep features of users and that of courses obtained by graph embedding are fused, respectively, which are used to improve the matrix factorization for course recommendation where E n p is the p-th entity in the n-th meta-path and R E n (p−1) E n p is the relationship between entity E n (p−1) and E n p .
Given the above definition, we construct rich semantics of various entity relationships, so that semantic expression of the meta-path becomes sufficient. We show all MSMP used for MOOCs in Fig. 3. MSMP 1 shows a concrete example, where a meta-path is used to capture users' correlation since they both learn courses and watch the video it includes.
MSMP makes up for the insufficient of single-entity relational semantics. However, if we want to capture the semantic that User 1 and User 2 learn the same course, and at the same time, courses involve the same type of aspect (such as video and concept), the meta-path fails. Therefore, it neglects the associative semantic information. Based on the MSMP, we propose to build the ASMG to extract the complex semantics.

Definition 2 Associative relational self-symmetric metagraph.
ASMG is a directed acyclic graph defined on network schema T G , consisting of entity set E MG and relationship set R MG , and is denoted as ASMG = (E MG , R MG ) . For the m-th ASMG, it has a single source entity node E m 1 (i.e., with in-degree 0) and a single target entity node E m 1 (i.e., with out-degree 0), contains more than two types of entity relat i o n s h i p s a n d d e s c r i b e s a r e l a t i o n s h i p s et is an intermediate entity in the i-th meta-path in the graph structure of a meta-graph, and R E m We show all ASMG used for MOOCs in Fig. 4. They are directed acyclic graphs with User or Course as the source node or the target node. ASMG 1 shows a concrete example, where a meta-graph is used to capture users' correlation since they both learn courses, and at the same time, courses include the same video and concept. Note that here we collectively refer to MSMP and ASMG as MR.

The correlation measure of meta-relationship
Given the above construction of MR, we calculate the correlation of different objects of the same type entity as the heterogeneous information by an algorithm of the metarelationship correlation measure.
At first, the interactive information of corresponding relationships through the peer-to-peer behavior connection  Associative relational self-symmetric meta-graphs in MOOCs. ASMG 1 denotes that two users are related through graphs containing different courses that include the same video and concept. ASMG 2 , ASMG 3 and so on have similar expressions is obtained based on whether peer-to-peer behavior between different objects of two types exists or not. The interactive information set A MP n = {A n objects of different entities have a peer-to-peer behavior, and it is expressed as: and E n p , k (p−1) and k p indicate the number of the entity E n (p−1) and E n p , a and b indicate the different objects of different entities, respectively.
Then, the commuting matrix for MR is calculated according to A MP n or A MG m . For the MSMP, the commuting matrix is calculated as: According to the combination law of matrix multiplication and transpose operation law, the formula 1 can be deduced as: where MP n L denotes symmetrical left half of MP n , C MP n ∈ ℝ k n 1 ×k n 1 is the commuting matrix of MSMP, and k n 1 is the number of the entity E n 1 . For the meta-graph, the problem becomes complex. When there are two paths, we can allow a flow to pass through either path, or constrain a flow to satisfy both of them [16]. Therefore, it is similar to simply split meta-graph into multiple meta-paths and adopt different computation from the meta-path. The formulas of the commuting matrix for ASMG that has the interactive information set A MG m as: (2) where MG m i denotes the i-th meta-path in a meta-graph. For the m-th ASMG, it has its own graph structure set MG m = {MG m 1 , ⋯ , MG m i , ⋯ , MG m } . denotes the total number of meta-path in the m-th ASMG. The formula 3 is used to calculate the commuting matrix of the graph structure, respectively. Then, they perform the Hadamard product operation of the graph structure as: Then, the commuting matrix of ASMG is obtained as: Because each element of the commuting matrix means the number of path connection, respectively, the commuting matrix should be normalized to uniform the metric. For the MR, it is a good choice to measure the degree of correlation between two different objects of the same type by the degree of divergence, shown as: where C MP n (a⇝b) or C MG m (a⇝b) is the correlation degree of connected paths between a and b, C MP n (a⇝a) or C MG m (a⇝a) is the correlation between a and a, and C MP n (b⇝b) or C MG m (b⇝b) is that between b and b. a, b indicate the different objects of the same type, respectively. S MP n ∈ ℝ k n 1 ×k n 1 is the correlation matrix of MSMP, and k n 1 is the number of the entity E n 1 .
C MP n (a⇝a) + C MP n (b⇝b) ,

Getting the recommendation list
Now, the correlation matrix S MP n or S MG m is used as edge feature for MR. One-hot encoding is used to generate node feature X. Multiple-layer graph convolutional network (GCN) is built with the layer-wise propagation rule h (l+1) = ReLU(Ph l W l ) , where h l is the new node representation of an entity at layer l, and h 0 is the entity node feature X. P =D − 1 2 SD − 1 2 , S is the correlation matrix, D is the degree matrix of S. Here, we express S MP n or S MG m as S. W l represents the trainable shared weight matrix of layer l. ReLU(⋅) denotes the activation function, where ReLU( ) = max{0, }.
Going through the three propagation layers, we obtain the final representation of MSMP or that of ASMG. However, different meta-relationships should not be considered equally. Therefore, the attention mechanism [19] is used to fuse representations. According to whether the source entity of the meta-relationship is recommendation object or recommendation item, we divide the final entity representations and meta-relationships into two types. Here, we regard user as recommend object and course as recommend item. Then, the final representations of user and course are defined as: where f u i or f c i denotes the representation of user or course based on the target multiple meta-relationship, and f u j or f c j is the representation based on the other multiple meta-relationship. s u or s c denotes the number of meta-relationship of user or course. is a trainable attention vector, and is the nonlinear function. According to the formula 8 to 11, we obtain the final representation of user F u and course F c . Finally, the recommendation list is got by solving the objective function: where p u,c is the interactive rating data using implicit feedback data [4], denotes a regularization parameter. By considering user interest factor and course characteristics factor fully, we use the rating predictor r u,c = x T u y c + u F uT t c + c t uT F c , where x u ∈ ℝ k LF ×k u is latent factors of the user, and y c ∈ ℝ k LF ×k c is latent factors of the course. k u , k c and k LF indicate the number of the user, course and latent factor for each user or course. u and c are the tuning parameters. The trainable parameters t u and t c are used to ensure F u and F c to be in the same space.

Datasets and evaluation metrics
To evaluate the effectiveness of the proposed method, we adopt two datasets, MOOCCube and XuetangX. MOOC-Cube [14] is a large-scale and high-coverage data repository that contains 199,199 users, 706 courses, 38,181 videos, 114,563 course concepts, 208 schools, and 1,738 teachers. Then, we collect entities and rich relational data (i.e., user↔course, user↔video, course↔concept, etc.). We divide learning behavior of students into training set and test set with the ratio of 8:2. Each instance in the training set or test set represents the user's historical behavior of learning course. XuetangX contains 9986 real users, 7020 courses, 43,405 videos and 1029 concepts. In addition, the dataset includes relationships such as user↔course, user↔ video, user↔concept and so on. It includes a training set occurring between October 1st, 2016 and December 30th, 2017 and a test set with the data occurring between January 1st, 2018 and March 31st,2018. Because MOOCCube has richer entity relationships, we use MOOCCube to analyze the combination of meta-relationships, the number of path relationships, and model parameters. In addition, in order to verify the method in this paper better, comparative analysis is conducted on two datasets with other methods.
Four kinds of common metrics are used to evaluate the methods [19]. Hit Ratio of top-K items (HR@K) is a recallbased metric that shows the percentage of ground truth instances in top K. Normalized Discounted Cumulative Gain of top-K items (NDCG@K) is a precision-based metric that measures predicted position of ground truth. We set K equal to 5 and 10. Mean Reciprocal Rank is an accuracy metric that measures average sort reciprocal of ground truth. In addition, we use the area under the curve of ROC (AUC) to evaluate methods.

Evaluation of different meta-relationships combination
The effects of MSMP and ASMG are studied firstly. In Table 1, MRCRec MP means that MSMP is used in MRCRec .
MRCRec MG means that ASMG is used in MRCRec .
MRCRec MR means that the combination of MSMP and ASMG is used in MRCRec , and achieves the best result. The reason is that MR takes into account the overall complex semantic information of multi-entity relationship and associative relationship. In Table 2, we discuss the association between entities of MSMP, ASMG and MR about user. The subscript (all) means that all combinations of user and course are included. Here, we use the same combinations of course and discuss effects of different combinations of user. MP(u : v, k) contains the combination of video and concept individually, while MG(u ∶ v + k) and MR(u ∶ v + k) contain the association with video and concept. MP(u : s, t) contains the combination of teacher and school individually, while MG(u ∶ s + t) and MR(u ∶ s + t) contain the association with teacher and school. For MP, the combination of video and concept is better than that of school and teacher. However, for ASMG and MR, the association effects of school and teacher are both better than that of video and concept. That is to say, it is more helpful to recommend courses when school and teacher are associated.
In Table 3, we discuss the richness of entity combination for MSMP and MR about course. The subscript (all) means that all combinations of user and course are included. Here, we use the same combinations of user and discuss the effects of different combinations of course. We regard user and course as primary entities. MP(c) and MR(c) only contain primary entities. MP(c : v) contains primary entities and video, and MR(c ∶ v + k) contains primary entities and the association with video between concept. For the MP, the effect of combination with video is better than that with only primary entities. As for the MR, the effect of combination with video and concept is better than that with primary entities. That is because the semantic information expression with only primary entities is single, and the semantic information expression of MR with video and concept is relatively rich. In addition, primary entities has information redundancy in complex combination MR(all). So the MR(c ∶ v + k) shows the best performance among all kinds of combinations for MR under the evaluation of HR, NDCG and MRR. Thus, it is used as representative result of the proposed MRCRec.

Evaluation of number of path relationships
In this part of the experiments, we conduct the ablation experiments to determine whether the number of path relationships between entities affects the performance of the method. From the Table 4, we can know that the experiment result is the worst when the number of entity types about user and course are both two. While the experiment result is the best when the number of entity types about user and course are more than two types. When the number of entity types of user is two types and that of course are more than two types, the experimental result is about 0.1% higher than when the number of entity types about user and course are both two in the AUC. Similarly, when the number of entity types of user is more than two types and that of course is two, the experimental result is about 0.56% higher than when the number of entity types about user and course are both two in the AUC. According to the experimental results, we can conclude that the number of path relationships about entities can affect the performance of the method. The reduction of the number of path relationships about user or course can degrade the performance of the network. From the above, we know that rich semantic information about entity relationships can appropriately enhance the effectiveness of the approach.

Evaluation of model parameters
In this part of experiments, we study the number of latent factors, the value of dropout rate, and the number of GCN layers. In the matrix factorization, the number of latent factors is an important parameter. Therefore, we adjust the number of latent factors to compare the performance of the models. We set the number of latent factors from 10 to 40 with the increase of 10. Figure 5 shows the performance of the number of latent factors in different methods. Among the three methods, MRCRec is significantly better than MRCRec MP and MRCRec MG when the latent factor is 30. The experimental results of MRCRec methods show a similar trend in different settings of latent factors.
To analyze the effects with different dropout rates, we set the value of dropout rate from 0 to 0.9. From Fig. 6, we can find that MRCRec is superior to MRCRec MP and MRCRec MG in terms of different values of dropout rate. The experimental results of MRCRec methods show a similar trend in different values of dropout rate. We can observe that the performance is the best when the dropout rate is 0.5. As for the performance of MRCRec, the value of dropout rate can be set at 0.5. Table 5 analyzes the effects of the GCN with different number of layers. The result manifests that a three-layer

Comparison with other methods
To evaluate the performance of MRCRec, we compare with FISM [5], MLP [3], NAIS [2], DGI [26], ACKRec [19] and MOOCIR [25]. FISM [5] is a collaborative filtering method that conducts recommendations based on historical behaviors of the user and target items. MLP [3] and NAIS [2] combine matrix factorization and multi-layer perception or attention mechanism to recommend items. DGI [26], ACKRec [19] and MOOCIR [25] use meta-path to guide graph representation for users and courses. For FISM, MLP and NAIS, we construct the rating matrix of user-course from the dataset as the historical behavior. For DGI, ACKRec and MOOCIR, we construct the user features, course features and their corresponding relationship features as inputs. We select appropriate parameters including embedding size of 16 and learning rate of 0.01 to obtain the results of FISM, MLP and NAIS. For DGI, ACKRec and MOOCIR, we set the dropout rate to be 0.5 and the latent factor to be 30. In addition, we set learning rate to be 0.01, and implement an exponential learning rate decays every 100 steps. To be fair, we use the same parameters as DGI, ACKRec and MOOCIR to experiments for MRCRec.
From the Table 6, it can be seen that MRCRec performs much better than FISM, MLP, NAIS, DGI, ACKRec and MOOCIR in the MOOCCube dataset. The AUC of MRCRec is about 3.84% to 8.30% higher than that of them, and the MRR of MRCRec is about 1.41% to 17.38% higher than that of them. The HR@5 of MRCRec is about 5.24% higher than that of DGI, and the HR@10 of MRCRec is about 8.05% higher than that of DGI. For comparing with MOOCIR, the HR@5 of MRCRec is about 4.10% higher and the HR@10 of MRCRec is 3.74% higher. The HR@5 of MRCRec is about 5.80% higher than that of ACKRec, and the HR@10 of MRCRec is about 13.90% higher than that of ACKRec. When using the evaluation of NDCG, the NDCG@5 of MRCRec is about 1.99% to 5.40% higher than that of ACK-Rec, MOOCIR and DGI. The NDCG@10 of MRCRec is 4.03% to 6.27% higher than that of MOOCIR, ACKRec and DGI.
As can be seen from Table 7, MRCRec also outperforms other six approaches in the XuetangX dataset. The AUC of MRCRec is about 0.23% to 5.83% higher than that of them, and the MRR of MRCRec is about 0.12% to 18.73% higher than that of them. Comparing with DGI, the HR@5 of MRCRec is about 0.18% higher and the HR@10 of MRCRec is about 0.47% higher. The HR@5 of MRCRec is 0.84% higher than that of MOOCIR, and the HR@10 of MRCRec is 0.27% higher than that of MOOCIR. Comparing with ACKRec, The HR@5 of MRCRec is about 0.20% higher and the HR@10 of MRCRec is about 1.59% higher. When using the evaluation of NDCG, the NDCG@5 of MRCRec is about 0.2%-1.43% higher than that of ACKRec, DGI and MOOCIR. The NDCG@10 of MRCRec is 0.2% to 0.64% higher than that of MOOCIR, DGI and ACKRec.
According to the result of the two datasets, we can obtain the reason is that DGI, ACKRec and MOOCIR can only employ the limited semantic information through the simple paths among entities. However, MR designed in MRCRec can explore the rich and complicated relationships among entities, which give excellent representation of users and courses, leading to accurate recommendation. Another point can be summarized from the results on the two datasets is that MRCRec improves the performance of the existing methods on the MOOCCube more than that on the Xue-tangX. The reason is that MOOCCube has richer relational data, which makes the semantic information expressed by the constructed meta-relationship richer.

Conclusion
This paper studies the issue of course recommendation in MOOCs. Meta-Relationship Course Recommendation (MRCRec) is proposed to address the insufficient semantic expression of entity relationship. To verify the effectiveness of this method, we analyze the impact of the MR composed with different types of entities, study the parameters of the model, and compare with a series of methods published recently. From the results, we can conclude that besides video and concept of course, the association between school and teacher is much helpful for recommendation. Another point can be concluded from the experiments is that MRCRec is superior to do course recommendation since the enough relations and semantics can be explored. Next, based on the meta-relationship extracted by MRCRec, the representation of graph will be further researched.