This section presents our method for scientific article recommendations to solve the limitations mentioned in section 2. Our proposal includes first a selection step of the algorithm that generates the most coherent topic model, which will serve as a basis for the recommendation process, the purpose of this step is a reference matrix of articles/topics; to achieve this goal, we use, the most popular topic modeling algorithms, namely LDA and non-negative matrix factorization (NMF) [17].
In the second step, the recommendation process performs a semantic similarity calculation to generate a list of relevant articles, which will present to the target researcher Figure.1
Notation and approach
We denote by \(A=\left\{{A}_{1}, \right.\left.{ A}_{2} , \dots , { A}_{n}\right\}\) the set of target researchers, and by \({A}_{i}\) a generic researcher in \(A\), and by \(D=\left\{{d}_{1} ,{d}_{2} , \dots ,{d}_{m}\right\}\), our corpus, that contains the articles that could be potentially interesting to our generic researcher.
The target article \({d}_{t }\)is represented by a topic distribution associated with the predominant topic (obtained by applying the best-performing algorithm between LDA and NMF).
Our recommendation algorithm aims to present to a target researcher (\({A}_{i}\)), a list of the most relevant and similar articles to the target article. The proposed approach has the following two steps.
Step 1:
-
Application and evaluation of the LDA and NMF algorithms on the experimentation corpus (with different combinations of hyperparameters), the goal is to select the best performing algorithm, which we call algorithm_1.
-
Referencing the recommendation corpus, by applying algorithm_1. each article will be represented by its predominant topic.
Step 2:
We accept the researcher's query to identify the target article (designated \({d}_{j}\)).
-
Retrieval all the meta data set of the target article \({d}_{j}\) (from google scholar to be precise) and we apply algorithm_1. it is assumed that the target article \({d}_{j}\) is referenced by its predominant topic designated by topic_d.
-
For each article \({d}_{i}\) \(\left({d}_{i}\right.\left.\in D ,{d}_{i}\ne {d}_{j} \right)\) and all articles referenced by topic_d do:
Computing \(Similarity\left({d}_{i}\right.\left.{ , d}_{j}\right)\) using Eq. (3).