In Philip K. Dick’s visionary novel “Do Androids Dream of Electric Sheep?”, the line between human and artificial intelligence (AI) blurs, prompting readers to ponder the capabilities of AI. Inspired by the great work, our study explores the performance of AI in academia. Given the widespread debates on the use of generative AI such as ChatGPT at universities worldwide, we particularly focus on the performance of generative AI, in assessing the acceptance of generative AI in higher education policies. Specifically, moving beyond the novel’s speculative fiction, our study compares generative AI evaluation with human evaluation and validates it using Technology Acceptance Model (TAM) or the Unified Theory of Acceptance and Use of Technology (UTAUT) framework.
Performance of AI
Recent research across various disciplines demonstrates that generative AI technologies perform well in passing professional and academic exams. A growing body of medical research has examined the performance of AI in passing exams. In the United States Medical Licensing Exam (USMLE), generative AI performed at or near the passing threshold of 60% accuracy (Kung et al., 2023). Similarly, another study found that generative AI performed at a level comparable to a third-year medical student on USMLE Step 1 and Step 2 exams (Gilson et al., 2023). Further, a previous study found that generative AI achieved a 79.9% correct response rate on the Japanese Medical Licensing Exam, notably outperforming the average examinee by 17% on hard questions (Takagi et al., 2023). Furthermore, a prior study found that generative AI outperformed the Japanese medical residents on the General Medicine In-Training Examination, particularly in areas requiring detailed medical knowledge and difficult questions (Watari et al., 2023).
Another body of research on engineering and computer sciences has examined the performance of AI in passing exams. Computer science research found that generative AI achieved a score that just met the passing of a computer science exam focusing on algorithms and data structures (Bordt & von Luxburg, 2023). Engineering research found that generative AI performed well across various tasks, including theoretical questions, programming, and practical circuit design, with a cumulative grade of 73% (Elder et al., 2023). A recent study found that generative AI was capable of solving simple math problems and addressing undergraduate-level mathematics questions (Frieder et al., 2024).
Management and financial research has also examined the performance of AI in passing exams. Previous research found that in the Operations MBA final exam, generative AI offered correct answers with excellent explanations for basic questions. (Terwiesch, 2023). An experimental study found that generative AI performed exceptionally well in economics exams, scoring higher than the average college student in both microeconomics and macroeconomics tests (Geerling et al., 2023). A previous study also found that generative AI performed well in solving basic finance problems in solving basic undergraduate finance problems with an 85% accuracy rate (Yang & Stivers, 2024).
Thus, recent studies in fields such as medicine, management, and engineering, provide evidence that AI technologies exhibit impressive performance in certain task completions, but there is little research concerning the performance of AI in education policy. The opportunities and challenges of adopting AI-generated content in educational policy are widely debated, and university guidelines reflect this divide (McDonald et al., 2024; Moorhouse et al., 2023). However, quantitatively assessing the acceptance of generative AI presents a challenge. To address this gap, we investigate the performance of AI in analyzing the text of guidelines, offering a method to measure the acceptance of generative AI in higher education. Specifically, this study examines the performance of generative AI in text evaluation by exploring the following two questions. Can generative AI evaluate the acceptance of generative AI as effectively as humans? Can we validate its evaluation with criterion evaluations of other aspects, including performance expectancy, university conditions, and perceived risk?
Evaluating Acceptance of Generative AI: Generative AI and Human Perspectives
For the first question, we argue that generative AI is capable of evaluating the acceptance of generative AI as effectively as humans given the previous literature on the performance of Ai in text evaluation. An experimental study examined the performance of generative AI feedback on writing and its preference among English as a new language students (Escalante et al., 2023). The experimental group received feedback from generative AI and the control group received feedback from their human tutor. The results showed that students who received feedback from generative AI did improve their writing skills comparable to students who received feedback from their human tutors. Those students were also split fairly evenly in their preferences between generative AI and human feedback, showing that each form of feedback has its own perceived benefits. Further, another study investigates the use of generative AI for automated essay scoring in assessing TOEFL essays (Mizumoto & Eguchi, 2023). Mizumoto and Eguchi (2023) compared AI-generated scores to human benchmarks and explored the effect of incorporating linguistic features on scoring accuracy. The results revealed that generative AI could effectively score essays with a level of accuracy and reliability, especially when combined with analysis of linguistic features. Furthermore, prior research explored the performance of generative AI in supporting English learning as a foreign language teacher (Guo & Wang, 2023). Guo and Wang (2023) asked teachers to evaluate both generative AI’s feedback on student writing and human teachers’ feedback. The results showed that generative AI generated more feedback than teachers, distributing attention evenly across content, organization, and language aspects. Overall, these findings suggest that generative AI performs certain tasks including text evaluation as effectively as humans, if not more so in some aspects.
Hypothesis 1
Human-rated acceptance of generative AI is positively associated with generative AI-rated acceptance of generative AI
Validating Generative AI Evaluation: Expectancy, Conditions, and Risk
As for the second question, we argue that we can validate its evaluation by confirming its correlations with evaluations of other aspects that research on the Technology Acceptance Model (TAM) or the Unified Theory of Acceptance and Use of Technology (UTAUT) framework focuses (Abdaljaleel et al., 2024; Ben Arfi et al., 2021; Davis, 1989; Polyportis & Pahos, 2024; Venkatesh et al., 2003). Venkatesh et al. (2003) developed UTAUT as a thorough synthesis of previous technology acceptance studies, by reviewing the existing models, including TAM. UTAUT includes four key constructs: performance expectancy, effort expectancy, social influence, and facilitating condition. Especially, we focus on perceived risk as well as performance expectancy and facilitating conditions. Our focus was informed by recent discourse and research articles in the respective contexts, in which universities expect generative AI to improve teaching and learning activities and prepare themselves for support and resources for it, but are concerned about academic integrity (Abdaljaleel et al., 2024; Crompton & Burke, 2024; Harvard University, 2023; Imperial College London, 2023; Stanford University, 2023).
Performance expectancy is the extent to which using a technology will provide benefits to individuals in performing certain activities (Venkatesh et al., 2003, 2012). In the context of generative AI in higher educational policies, benefits involve the degree to which generative AI tools will support and improve teaching and learning activities, contributing to educational outcomes in academic research and activities. A large body of research has shown that performance expectancy can be a determinant of AI technology acceptance (Andrews et al., 2021; Chatterjee & Bhattacharjee, 2020; Guggemos et al., 2020; Raffaghelli et al., 2022). A recent survey showed that performance expectancy had a positive effect on students’ attitudes towards the generative AI (Foroughi et al., 2023). That is, when students believed that using the chatbot would bring them benefits such as convenience, efficiency, or effectiveness in their educational tasks, they had a positive attitude toward the chatbot. Provided that performance expectancy has a positive effect on accepting new technologies, as universities expect AI to greatly assist in personalized learning, efficient data analysis, and creative academic endeavors, they should accept generative AI. Therefore, we hypothesize,
Hypothesis 2
Performance expectancy is positively associated with acceptance of generative AI
Facilitating conditions are defined as individuals’ perceptions of the availability of resources and support necessary for performing activities (Brown & Venkatesh, 2005; Venkatesh et al., 2003). Within the context of generative AI in higher education policies, these resources and support specifically include the extent to which universities provide the essential technical, academic, and policy frameworks required for the effective integration of generative AI tools. Numerous studies have suggested that facilitating conditions are another determinant of individuals’ acceptance of AI technologies (Cabrera-Sánchez et al., 2021; Chatterjee & Bhattacharjee, 2020; Kwak et al., 2022). More recently, a survey study found that facilitating conditions had the strongest effect on students’ intentions and actual usage of generative AI (Habibi et al., 2023), indicating that when students had access to the necessary resources for using generative AI, such as a laptop and internet connection, they were not only more intent on using generative AI but also used it more frequently. Given these findings on facilitating conditions and technology acceptance, universities that provide the required resources and support for leveraging generative AI should endorse its acceptance. Accordingly, we hypothesize,
Hypothesis 3
Facilitating conditions are positively associated with acceptance of generative AI
Perceived risk refers to an individual’s subjective evaluation of the potential negative outcomes associated with a specific action (Abdaljaleel et al., 2024; Ben Arfi et al., 2021; Zhang et al., 2019). In the context of generative AI in higher education policies, such risk covers the degree to which a university perceives potential risks associated with the use of generative AI tools in academics such as cheating, misinformation use, and copyright violation, which undermine student learning. Some studies have suggested that perceived risk is a potential determinant of individuals’ acceptance of new technologies (Ben Arfi et al., 2021; Zhang et al., 2019). Recent research found the negative effect of perceived risk on acceptance of generative AI, which showed that when students and faculty member believed that using generative AI for answering academic queries is risky, they did not think that generative AI in higher education is good for society (Jain & Raghuram, 2024). A multinational study replicated this result (Abdaljaleel et al., 2024). If perceived risk has a negative effect on accepting new technologies, as universities perceives potential risks associated with the use of generative AI tools in higher education, they should be cautious about endorsing its acceptance. Thus, we hypothesize,
Hypothesis 4
Perceived risk is negatively associated with acceptance of generative AI
Present Study
The present study rigorously examines the performance of generative AI, a generative AI model, in assessing the acceptance of generative AI in guidelines from top-ranked universities worldwide. We aim to understand how well generative AI evaluation corresponds with human judgments in terms of acceptance, performance expectancy, facilitating conditions, and perceived risk associated with the use of generative AI. The purpose is twofold: first, to investigate the performance of generative AI in accurately reflecting human perspectives on the adoption of AI within educational policies; and second, to examine the validity of generative AI evaluation from existing technology acceptance frameworks, including TAM and UTAUT.