Translational models interpret relations as simple translations over hidden entity represenations. TransE[4] is one of the common translational models where both entities and relations are considered as vectors in the same space. This model aims to model the inversion and composition patterns. Despite simplicity of TransE, it cannot perform well in one-to-many, many-to-one, and many-to-many relations [5, 6]. Although some of the complex models handle this issue but still they are not efficient in the process. For example relation Writer of, might learn similar vector representations for Harry Potter, Fantastic Beasts and Where to Find Them, and The Ickabog which are all books of J.k. Rowling. However, these entities are totally different. To overcome this issue, extensions of TransE inclduing TransH [5], TransR [6], TransD [7], TransM [8], and TransW [9] have been recently proposed which have different relation embeddings and scoring functions.
TransH [5] models a relation as a translating operation on a hyperplane with almost the same complexity as TransE. In this model, each relation is represented by two vectors, the norm vector of the hyperplane and the translation vector on the hyperplane. They addressed the issue of N-to-1, 1-to-N and N-to-N relations by enabling each entity have distinct distributed representations. The experiments on link prediction, triplet classification and fact extraction on benchmark datasets like WordNet and Freebase shows improvements compared to TransE.
TransE and TransH both simply put both entities and relations within the same semantic vector space. However, an entity may have multiple aspects and relations. each relation might focus on special aspect of that entity which might be far away from others. Besides, entities and relations are completely different objects which can make them not suitable to be represented in the same vector space. TransR build entity and relations embeddings in separate vector space and then build translation in the corresponding relation space. The comparison between TransR and the two previously introduced models shows significant improvements including link prediction, triple classification and relational fact extraction. TransR uses a projection matrix which projects entities from entity space to relation space [6].
TransD uses two vectors to represent an entity or a vector. One of them represents the meaning of the entity or relation and the other one is used to construct mapping matrix dynamically. This way, it covers both diversity of relations and entities. TransD is proposed to simplify TransR by eliminating matrix vector multiplication operations and also has less parameters which results in more applicability to large scale. The evaluation of the model in link prediction and triplet classification outperforms the previously mentioned models. In TransD, each entity-relation pair has a unique mapping matrix. The elimination of matrix vector operations in this model improved the performance.
In TransM [8] they leveraged the structure of knowledge graph via pre calculating the distinct weight for each training triplet regarding its relational mapping property. In this model, optimal function deals with each triplet based on its own weight. In this model, the transition model for triplets will be hold the same as TransE but the optimal function they proposed uses pre-calculated weight corresponding to the relationship. The main difference between TransE and TransM is it is more flexible when dealing with heterogenous mapping properties of KGs by minimizing margin based hinge loss function. The proposed model outperformed in link prediction and triplet classification tasks.
Recently, TransW[9] proposed using word embeddings for knowledge graphs embeddings to better deal with unseen entities or relations. Unlike previous works which ignores the detail of the words within triples, TransW aims to enrich a KG by missing entities and relations using word embeddings. linear combination of word embedding of entities and relations in this model leads to detect unknown facts. The word embedding for relation and entities is calculated separately using Hadamard product. The results was outperforming compared to previous translational approaches.
RotatE [10] is another translational based approaches for KG representation learning. This model is able to infer different relation patterns of symmetry and antisymmetry. RotatE model defines each relation as a rotation from source entity to target entity in the complex vector space. Some relations are symmetric like marriage and some are antisymmetric like filiation; some relations are inverse like hypernym and hyponym; and finally some are composed of others like my dad’s wife is my mom. How to infer these characteristics in KGs are essential to predict missing links. Unlike the above mentioned models RotatE aims to model and infer these characteristics at the same time.
HAKE [11] is a translational distance model with some similarities to RotatE [10]. Despite RotatE, HAKE aims to model the semantic hierarchy rather than modeling relation patterns. Unlike RotatE which models relations as rotations leads two entities to the same modulus, HAKE explicitly models modulus information.