Intelligent Tutoring System: A Bibliometric Analysis and Systematic Literature Review

This article is a bibliometric analysis and literature review, having a central axis the evaluation mediated by Intelligent Tutors Systems (ITS) in education, seeking to establish state of the art on the implementations executed in the last 42 years and their impact on the evaluation process. It was based on a bibliometric analysis of 1,890 abstracts, allowing to establish the main information sources in the �eld. The �rst �lter was carried out with R software and bibliometric techniques with a general search equation that allowed access to all the production of ITS registered in Scopus; this analysis used keywords and summaries. Subsequently, with the help of arti�cial intelligence, text mining was used to identify topics of interest in the scienti�c community, followed by new �ltering. Finally, the selected full texts were analyzed with NVIVO software extracting emerging challenges in the �eld, obtaining 164 complete texts for analysis. Among the main �ndings, the primary purpose of evaluation in ITS was summative, peer evaluation and self-evaluation did not have the same level of importance as hetero evaluation, and ITS focus was quantitative; all this allowed us to conclude that the analyzed texts did not implement a holistic perspective.

Furthermore, all those related to the individual process of the subjects are complex even for a conventional number of students, and since the evaluative processes of this level of personalization require an investment of time on behalf of the educational actors that do not correspond to the implementation model (maximizing the number of participants, minimizing tutors).
Thinking about these tasks for massive groups requires an intelligent data processing system that learns from the data and acts as a virtual master, performing accurate decision-making evaluation. However, the approaches to this problem are still under development. Fundamental variables have been considered [14] [15]. For example, students' self-regulation or motivation have been included in some ITS. However, aspects such as diagnostic, formative, and summative evaluation have not been considered together. Therefore, a systematic review was carried out to identify and evaluate articles that propose implementations of evaluation systems using machine learning techniques for massive volumes of data. Methodology A funnel system is proposed to access a broad spectrum of information and to have an objective view of it, with three ltering moments to select the complete papers included for analysis (see Fig 3).
The rst lter was made with R software and bibliometric techniques. Then, a general search equation allowed access to all-time production on ITS registered in Scopus (only papers were selected). This analysis was carried out using keywords and summaries.
Subsequently, with the help of arti cial intelligence, text mining was used to identify topics of interest in the scienti c community, followed by new ltering.
The selected full texts were analyzed with NVIVO software to extract emerging challenges in the eld. This study aims to answer the following questions: Q1: What is the ITS primary evaluation purpose? Q2: What is the main evaluating agent (in evaluation processes)?
Q3: What is the main approach used in the selected ITS?
Q4: Is the ITS evaluation process implemented holistically?
These questions arise from the need to understand evaluation in the context of learning, in particular deep learning. Speci cally, a holistic and complex evaluation that can account for the student's capacity for critical analysis of new ideas and their integration with previous knowledge, thus favoring understanding and retention in the long term to later be used to solve problems in different contexts.
An evaluation that accounts for summative aspects, but also for the levels of cognitive skills such as "analysis" (comparing, contrasting) and "synthesis" (integrating knowledge in a new dimension), integrated with metacognitive aspects that promote understanding and application of lifelong learning can be considered a holistic evaluation.

Bibliometric analysis:
With the search equation, *intelligent tutoring* the following results presented in Table 1 were obtained. However, it is crucial to bear in mind that this general equation is only considered since it is expected to obtain new ltering criteria that will lead to a more re ned equation. A total of 1,890 results were found in Scopus, covering 42 years of academic production. The texts considered were articles published in specialized journals, although it is recognized that this eld of knowledge has important dissemination through conferences. However, due to the objective of the study to identify structured knowledge with a high level of depth, conference papers were not included in this analysis. Thus, a total of 3,819 authors were considered in this initial search.
The academic production origin was in 1979; in 2014, it reached its maximum (105 papers,) and since 2016, such production has slightly decreased ( The main authors by total citations in the chosen period are presented in Fig 7. For example, Kenneth R. Koedinger, professor of human-computer interaction and psychology at Carnegie Mellon University, is the founding and current director of the Pittsburgh Learning Science Center, with 2,112 citations.
The data represented in Fig 8 is the KeyWords Plus count. They are generated from words or phrases that frequently appear in the reference's titles of an article but do not appear in the article's title. Using R and the Bibliometrix plugin, it is possible to obtain them. KeyWords Plus enhances the power of cited reference searching by looking across disciplines for all articles with commonly cited references.
Gar eld claimed that Keywords Plus terms could capture an article's content with greater depth and variety [16]. However, Keywords Plus is as effective as Author Keywords in bibliometric analysis investigating the knowledge structure of scienti c elds, but it is less comprehensive in representing an article's content [17].
In Fig 8, computer-aided instruction stands out as the main topic, representing 17% of the frequencies examined in the text references. Finally, For the elaboration of Fig 9, it was considered that the co-occurrences could be normalized using similarity measures such as the Salton cosine, the Jaccard index, the equivalence index, and the strength of association [18].
The selected algorithm was the strength of the association since it is proportional to the relationship between the observed number of co-occurrences of objects i and j, and the expected number of co-occurrences of objects i and j under the assumption that occurrences of i and j are statistically independent.
For the grouping strategy, "Walktrap" was selected as one of the best alongside "Louvain" [19]. The graph is interpreted considering the following characteristics: The colors represent the groups to which each word belongs. In this case, there are three groups. In the rst one in red, the theme of computer-aided instruction is dominant in citations. There is no central theme in the citation in the green one but in relationships, and it is Expert Systems relating topics of interest such as arti cial intelligence. Finally, the third group, colored blue, seems to be a subgroup of the rst one focused on educational issues.

Text Mining
Although the bibliometric analysis nds the authors and journals with the most impact in the speci c eld, the possible thematic elds based on the analysis of the Keywords Plus and a classi cation of these in groups it is necessary to perform additional analysis to identify more speci c thematic groups, for which the Software Knime [20] was used.

Strings to Document
It converts the speci ed strings to documents. For each row, a document will be created and attached to that row.
Preprocessing This is a metanode, which groups several nodes responsible for multiple tasks, including Part of Speach tagging, lemmatization, stop word, number, ltering. Inside this metanode are the elements shown in Fig 12. Table 3 has the description of each item presented in Figure 12.

N Chars Filter Filters all terms contained in the input documents with less than the speci ed number of N characters
Stanford Tagger This node assigns each term a part of speech tag.
Stanford Lemmatizer Lemmatizes terms contained in the input documents.

Case converter Uppercase and lowercase converter
One of the main elements of this algorithm is the Topic Extractor, with which it is possible to achieve the following: Automatically nds the top K topics with the most relevant N keywords discussed in a collection of unlabeled documents (considered unsupervised).
It represents documents as random mixtures over latent topics, where a distribution over words characterizes each topic.
Syntax or order of the words in the document is not important (bag of words model).
Document order is not important.
The same word can belong to different topics.
The number of topics needs to be selected/known in advance.
Two important hyperparameters of the Dirichlet distributions: α Controls the per-document topic distribution.
β Controls the per-topic word distribution.
This process is known as the Simple parallel threaded implementation of LDA [21][22] (see Figure 13).
In Figure 14, the process for dimensional reduction is presented, and in Table 4, there is the description of each item in gure 14:

Interactive visualization
This is the metanode in charge of allowing the visualization of emerging topics in an interactive way. The nodes found inside are shown in Fig 15. Table 5 describes the items in the Metadone interactive visualization in Figure 15.  Table  View Displays data in an HTML table view. The view offers several interactive features, as well as the possibility to select rows.

Scatter Plot
With this node, a scatter plot is obtained.

Tagging
This metanode groups the nodes presented in Fig 16. It is the last metanode in this section. It is done in labeling that will allow viewing the word cloud and the texts associated with each topic.
In Table 6 is a description of the items in the Metanode Tagging. This node recognizes named entities speci ed in a dictionary column and assigns a speci ed tag value and type. Optionally, the recognized entity terms can be set unmodi able, meaning that the terms are not modi ed or ltered afterward by any following node.
Tag Filter Filters terms are contained in the input documents that have speci c tags assigned. A term if not ltered out if at least one of its assigned tags is part of the speci ed tags. If strict ltering is set, all assigned tags of a term must be speci ed tags.

Bag of Words Creator
Create a bag of words from a set of papers. It consists of at least one column that contains the terms that appear in the corresponding document. The programmer can interact with the result and customize the display.

IDF
Inverse Document Frequency-determines the number of documents containing the T concepts, which come from keywords Plus.

String to Term
Converts the strings of the speci ed string column to terms and appends a new column containing these terms.
Tag Cloud A tag cloud view using JavaScript libraries, which can be customized.

Document Data Extractor
It is responsible for extracting desired information in columns.
After going through these nodes, the algorithm returned the following result. In Fig 17, all the selected terms are classi ed into ve topics from the 1,369 abstracts; each topic requires interpretation. However, the focus of the analysis was to determine if some of them were related to the category of interest: evaluation.
The program interface allows the analyst to explore each of the ve topics, as shown in Fig 18. For example, topic_0 contains the terms game, instruction, intelligent, language, reading, skill, strategy, study, system. In the "document" column, the text and the weight of contribution to each of the terms were displayed.
The topic_3, represented in yellow in Fig 19, emerges naturally among the analyzed abstracts. The terms that compose it are affective, assessment, data, emotion, method, model, performance, result, student, and system, all of them with high values for this studio. Therefore, this result -with high values-was the selection criteria to link the full texts analyzed in Nvivo in the next phase.
One hundred sixty-four papers were selected from the text mining of the emerging group represented in Fig 19. It is essential to consider that the weight of the term assessment is not high compared to the other terms identi ed in topic_3 and even less compared to the total number of identi ed terms, as shown

Results
The results are presented in this section; a year-wise representation is given in Fig 21. These results are characterized by research questions posed earlier in this study. The variables of selected studies are presented in Table 7.
Q1: What is the ITS primary evaluation purpose?
Q2: What is the main evaluating agent (in evaluation processes)?
Q3: What is the main approach used in the selected ITS?
Q4: Is the ITS evaluation process implemented holistically?  Next, we present each research question and its results.
Q1: What is the main purpose of the evaluation in these ITS?
According to the data found, the primary purpose of the evaluation is summative; that is, most of the evaluation sections in the ITS analyzed tried to establish reliable balances of the results obtained focusing on the collection of information and the elaboration of instruments that allow reliable measurements of the knowledge to be evaluated at the end of a teaching-learning process.
Q2: What is the main evaluating agent (in evaluation processes)?
The main evaluating agents were those external to the student or their peers; that is, hetero evaluation was prioritized. This is consistent with the purpose found in question 1. Most ITS identify gaps or "weak spots" that need to be reinforced before moving forward with the program and design redress activities aimed at the group or individuals who require it.
Q3: What is the main approach used in the selected ITS?
The main approach found was the quantitative one; this makes much sense since smart tutors use data to achieve process automation. However, qualitative approaches were evidenced to a lesser extent, and in some cases, the use of both was possible, thanks to the technological development that allows emotional interpretation and the participants' language.
Q4: Is the evaluation process implemented in ITS holistic?
To answer this question, the criterion was the following: in each of the selected papers, diagnostic, formative, and summative evaluation elements were sought. It was also tracked whether the STI made use of heteroevaluation, peer review, and self-assessment. Furthermore, nally, it was determined whether it integrated qualitative and quantitative approaches. All this to account for a holistic assessment that favors deep learning. Texts that met all these criteria would be classi ed as holistic.
Under the criteria applied, it is possible to a rm that holistic designs were not found in the analyzed texts. Mainly, special attention is required to the diagnostic and formative evaluation. Furthermore, it is also necessary to encourage the participation of other agents in the evaluation processes of ITS, speci cally peer evaluation and the participation of other actors, such as the family. Finally, the mixed approach can enrich the reading of the process; the qualitative evaluative aspects in ITS are a technical challenge; however, these can be included through properly trained bots.

Emerging challenges
Based on Table 8, it was possible to identify the analysis focuses and propose the following challenges.
Demonstrate the pedagogical value of scaffolding by intelligent tutors.
According to Arevalillo-Herráez [166], to facilitate problem-centered instructional models, it is necessary to provide scaffolding, that is, contingent support from another more capable person who helped others solve complex problems and acquire valuable skills in doing so, these include deep content learning, argumentation skills, and problem-solving skills. Traditionally providing this type of coaching requires small groups and personalized training processes.
With the help of intelligent tutoring systems, it is possible to provide this support in large groups; however, the expected learning outcomes of scaffolding respond to different variables, such as cognitive, motivational, or metacognitive aspects. In the cognitive aspects, it has been found that Intelligent Tutoring Systems favor signi cant progress. However, the motivational and metacognitive aspects require further research to demonstrate their pedagogical value.
This can be evidenced by the priority given in the selected full texts to evaluating summative aspects.
Link an e cient evaluation mechanism.
Current trends indicate that online learning has become a vital learning mode; however, a holistic evaluation mechanism was not identi ed in the analyzed texts.
The learning performance assessment aims to assess what students learned during the learning process. It is usually summative or formative; however, both have been confused with the rating in some ITS, focusing on materializing a numerical value. This is clearly due to the learning framework in which each research is inscribed. However, to mobilize higher thinking skills such as problem-solving, critical thinking, or creativity (typical of deep learning) and according to the results found in Table 2, it is necessary to complement this approach with qualitative approaches.
Use multiple data sources.
The fundamental challenges to consider when thinking about an intelligent tutor are usually the data sources to feed the predictive models, which come from the summative assessment, such as the result of exercise A or the performance in unit B. However, it is crucial to determine the pedagogical value of the actions that led to these results and the implications of these data in predicting the participants' performance [30] [35].
Need to link e-learning environments with intelligent tutoring systems.
Assessment of students' performance on exercises could delay the tutor's feedback to students for days or even weeks. Then, in some cases, tutors may have to reduce the number of assignments given to their students due to time constraints. Especially in large-scale courses, accurate and meaningful evaluation is a demanding task for tutors. Moreover, accuracy is often di cult to achieve for both subjective and objective reasons.
Possible solutions to the emerging remains.
In the above discussion, several challenges were identi ed. To address them, the following research challenges are posed.
Understand and implement the difference between evaluating and grading.
Intelligent Tutoring Systems require moving toward an interpretation of the numerical results, which allows for feedback as proposed by Daniel Wilson, director of the "Zero" project at Harvard University, who indicates that the feedback process consists of four ascending phases: Clarify, Value, Express concerns and Make suggestions, which allows focusing on communication with the student in the construction of meaning, towards the achievement of deep learning [187]. Currently, developments have focused on grading.
Designing a holistic framework.
The theory of conscious processes, elaborated by Álvarez de Zayas, [1] is of a systemic, holistic, and dialectical nature, that is to say, complex. It presents a rede nition of the School as a space where teaching and systematization take place to essentially give way to the training process. An ITS designed under this perspective understands evaluation in a systemic, articulated, holistic, and dialectical way. The test is relational and is not the only instrument to obtain information about the teaching and learning processes. It includes aspects related to the purpose, the extension, the evaluating agents, the moments, the approaches, and comparison standards. Dialectically produced tools are used between components and between actors.
Focus on the process, not just the outcome.
To provide a solution to this aspect, ITS must move toward formative evaluation, which implies collecting, analyzing, and identifying student progress (learning monitoring) and re ecting, providing feedback, reorienting, and creating support strategies for students (pedagogical use of the results). The latter is a technological challenge, which implies training the ITS not only with quantitative data.
Implement Learning Analytics Systems that impact the curriculum.
When the evaluation process is done correctly, changes to the curriculum emerge naturally, enabling the student to access authentic deep learning. This line of research would imply establishing a framework that allows arti cial intelligence to detect new learning goals for the student based on the analysis of mixed data.

Conclusions
The use of text mining was fundamental to extract knowledge from a wide eld of academic production. Other researchers in different elds can use the work ow adapted in KNIME to optimize reading time and focus attention only on the aspects of interest.
Based on smart tutors' research, it was possible to identify that progress has been made in detecting concepts that require further study in the constant feedback to students and teachers in a personalized and automatic way. First, however, it is necessary to propose a framework that offers mixed feedback to students and teachers that facilitates decision-making based on implementing predictive methods, an evaluation that transcends the grading, which is possible thanks to the fusion between pedagogical and technological aspects.
Deep learning seeks to give meaning to new information; that is, it aims to incorporate a critical perspective on certain learning and, in doing so, favor its understanding to allow its long-term retention. Achieving it requires moving towards a complex evaluation that involved different forms of evaluation, actors, moments, approaches, and analysis.
The ITS requires moving towards an interpretation of the numerical results, allowing communication with the student to focus on constructing meaning towards a holistic evaluation. This holistic evaluation includes the student's participation and peers' diagnostic, formative, and summative aspects. These changes will make it possible to account for the depth of learning achieved.
Moving towards this type of evaluation involves analyzing quantitative and qualitative variables. Therefore, creating a framework that allows arti cial intelligence to integrate all these variables and effectively communicate its results is necessary. In other words, and ITS is required, capable of assessing and measuring all variables related to deep learning and achieving a truly holistic assessment.

Declarations
Ethics approval and consent to participate.

Consent for publication
Not applicable Availability of data and materials All data generated or analyzed during this study are included in this published article.

Competing interests
The authors declare that they have no competing interests.

Funding
Not applicable.

Authors' contributions
The author read and approved the nal manuscript. Basic components. Adapted from [10] and [11].  Production per year Figure 5 Most relevant sources.

Figure 6
Total citation of the main sources  Co-occurrence of words in keyword Plus from all sources Figure 10 Co-occurrence of words in keyword Plus from all sources.

Figure 11
Co-occurrence of words in keyword Plus from all sources Figure 12 Metanode preprocessing Figure 13 Topic Extractor Figure 14 Reducing dimensionality, assigning Colors Figure 15 Metanode interactive visualization.  BibliometrixExportFile20210428.xlsx