Generative AI and Higher Education Assessments: A Competency-Based Analysis

doi:10.21203/rs.3.rs-2968456/v1

Download PDF

Research Article

Generative AI and Higher Education Assessments: A Competency-Based Analysis

https://doi.org/10.21203/rs.3.rs-2968456/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

The development and introduction of AI language models have transformed the way humans and institutions interact with technology, enabling natural and intuitive communication between humans and machines. This paper conducts a competence-based analysis of an emerging AI language model’s task response to provide insight into its language proficiency, critical analysis and reasoning ability, and structure and relevance of the response. A multidisciplinary approach is adopted, drawing from fields such as Accounting, Education, Management, Social Work and Law, to evaluate the responses generated by the AI to higher education assignments. This paper offers insights into the strengths and limitations of language-based AI responses and identifies implications for the design and implementation of higher education assessments.

Educational Philosophy and Theory

Artificial Intelligence

higher education

assessment

task-response competency

ChatGPT

performance

Language-based artificial intelligence models have been around for a long time. However, the recent introduction of GPT 2 and 3 has generated unprecedented international discussions and debates about the future of higher education assignments as well as the intellectual and practical preparedness of future graduates. For example, ChatGPT, which stands for Chat Generative Pre-Trained Transformer, an improved version over earlier iterations of large language model AI-driven machine learning systems, while not the only AI system available,^{^[1]} is claimed to have the ability to generate text in response to questions that are sufficiently sophisticated to be indistinguishable from a human answer, including responses to higher education assignments (Elkins & Chun, 2020; Ault, 2023; Varanasi, 2023). The popularity of AI language models has soared rapidly since the introduction of ChatGPT in November 2022, with reports indicating that by February 2023, there were more than 100 million users of the AI chatbot, a record that overtakes the inception of popular social media platforms such as TikTok and Instagram (Shankland, 2023). With its humanlike capacity to generate responses to questions, and its increased use among students and academics alike in higher education (Lucey & Dowling, 2023), Generative AI has the potential to disrupt tertiary education if used unethically - to cheat - by passing off the generated answers as a student's own, in exams and assignments (Dehouche, 2021). Out of the several generative language-based AI models available, this paper focuses on ChatGPT and its use for higher education assessments.

In response to the claim that ChatGPT has the ability to answer higher order questions (Ault, 2023; Varanasi, 2023), we set out to assess the accuracy of this claim. This study investigates the competence and utility of ChatGPT in answering university-level questions across a variety of academic disciplines. Included in this study were subjects from a broad selection of university-level disciplines: Accounting, Social Work, Law, Management, and Education. Questions were taken from each discipline area and run through the ChatGPT AI chatbox. Responses were then marked according to the university rubric in each discipline for those questions and graded. Adopting the Competency-Based Learning theory, we discuss the accuracy and relevance of the ChatGPT- generated response and proffer implications for teaching and learning in universities. The next section of the paper presents the theoretical background, followed by the methods section. Next, we present the analysis and discussion of the responses of ChatGPT and then present the implications and conclusion of the study.

^{^[1]}It is worth noting that there are other large language model systems available, such as Copy.ai, Wu Dao, Chat Sonic, Chinchilla, Bloom, Replica, Galactica, Jasper Chat, LaMDA, Elsa Speak, Dialo GPT, and many more, with the improved GPT4 released in the first half of 2023.

Artificial Intelligence and higher education

Artificial Intelligence (AI) is revolutionising the world today (Mavrikis, 2010). AI is interdisciplinary and constantly advancing making it difficult for experts to define. Studies have shown that AI systems are capable of automating tasks that are typically carried out by humans. The advancement in AI implies a continuation of the digital age, transforming the way we live and requiring individuals to possess new skills and knowledge to thrive (Chaudhry & Kazim, 2022). These have always been important, but with the recent acceleration of digital transformation and the emphasis on continuous learning in most professions, they have become a necessity for learners. In higher education, AI has the potential to transform education by providing personalised and adaptive learning experiences, improving the efficiency of administrative tasks, and offering new educational opportunities in areas such as virtual and augmented reality (Popenici & Kerr, 2017). However, risks remain as overreliance on AI can defeat the very purpose of higher education.

The integration of new technologies in education has undergone significant advancements over the past 30 years, making it easier for students to use basic technologies. In the initial years of AI technologies, there was resistance towards the use of tools like calculators and spellcheck programs (Lazarus et al., 2008). Researchers argue that assistive technologies like text-to-speech, speech-to-text, zoom capability, predictive text, spell checkers, and search engines were primarily designed for people with disabilities (Lazarus et al., 2008; Popenici & Kerr, 2017). However, these technologies have since become widely adopted and are now considered standard features in all personal computers, handheld devices, and wearables (Luckin, 2017). These technologies have now expanded the potential for teaching and educational design to enhance the learning experience for all students globally. The purpose of these technologies in higher education is to supplement and improve the learning experience, not to simplify it into a mechanical process of content delivery, control, and evaluation (Popenici & Kerr, 2017). In this study, AI can be described as computing systems capable of performing tasks similar to those of humans, including learning, adaptation, synthesis, self-correction, and utilizing data for advanced processing (Popenici & Kerr, 2017). The following sections explore four key areas where AI’s influence is visible: revolutionising assessments, intelligent tutoring systems, contextualising students' learning and reducing academic workload.

Revolutionising Assessment

Improvements in AI are gradually changing the nature of how students can be assessed. Traditionally, the focus of assessment is to evaluate a student's work or performance on study units. Hill and Barber (2014) consider assessments as a crucial element of schooling targeted at gauging a student's knowledge, understanding, and skills. Ideally, assessments should consider individual student strengths and provide valuable information on learning outcomes. Among the recent developments in this area is Assess AI meant to provide a more comprehensive evaluation, considering evidence and student progress over time (Samarakou et al., 2014). These assessment tools use machine learning techniques such as semantic analysis, voice recognition, and reinforcement learning to improve their evaluations and reduce the workload of instructors.

Intelligent tutoring system (ITS)

The intelligent tutoring system is one of the areas identified as a key contributor to teaching and learning in academic institutions. ITS is a computer program that aims to imitate a human teacher to provide personalized education to students. ITS are expected to use a combination of domain models, pedagogical models, and learner models to provide customized and contextualized learning experiences, like a human expert teacher. To enhance student learning, other models such as strategy, knowledge-base, and communication models have been added (Utterberg et al., 2021). These additions are to ensure students have learned and improved over time, like a human teacher.

Contextualised learning for Students

The unique learning needs of each student, including prior knowledge, social background, economic status, and emotional state, dictate the most effective teaching approach (Ma et al., 2014). AI can help identify students' learning gaps, provide personalized content, and offer step-by-step guidance on complex problems. For instance, iTalk2Learn is a speech-based math tutor for students ages 5-11 that intervenes when students struggle with fractions (Grawemeyer et al., 2015). Open Learner Models and self-regulated learning provide insight into how students learn and how AI can enhance their education (Steenbergen-Hu & Cooper, 2013). Currently, fully autonomous digital tutors are not yet available in education, but domain-specific Intelligent Tutoring Systems (ITS) can still provide valuable information about student understanding and effective pedagogies (Chaudhry & Kazim, 2022).

Reducing academic workload

Recent improvements in AI have shown the great potential of reducing workloads to improve effective teaching in the classroom by giving teachers more time to focus on instruction. These improvements require educators to reskill and upskill themselves to fully utilize the benefits of AI (Selwood & Pilkington, 2005). Chaudhry and Kazim (2022) and Selwood and Pilkington (2005) revealed three important factors to consider in realising the benefits of AI. Firstly, academics need to become tech-savvy to understand, evaluate, and adapt to new AI tools as they become available. Although they may not necessarily use these tools, it is important for them to have a basic understanding of the tool's capabilities and how they can help reduce workload. Secondly, teachers will need to develop analytical skills to interpret the data visualized by AI tools to better understand their students. Thirdly, academics will need to develop new team-working, group, and management skills to integrate these new tools into their daily routines and manage them efficiently. From the perspective of Selwood and Pilkington (2005) the use of ICT leads to a reduction in the workload of academics if they use it frequently, receive proper training, and have access to ICT at home and school.

AI capabilities in Higher education: Theoretical framework

The goal of AI is to create systems that can exhibit human-like intelligence, but with greater accuracy, consistency, and scalability. AI systems can be designed to perform a wide range of tasks, including speech recognition, image classification, natural language processing, game-playing, decision-making, and many others. However, recent studies have shown that these AI technologies are gradually eroding the integrity of online exams or assessments (Susnjak, 2022). Many assessments are easily answered by these technologies relieving students from thinking and applying concepts discussed in class. In dealing with these issues, lecturers or academics should structure their questions in a way that requires students to apply critical thinking techniques (Whisenhunt et al., 2022; Stanger-Hall, 2012; Watters & Waters, 2007).

To explore the competency of AI task response, we draw on competence-based learning (CBL) theory. Competence-based learning theory is a framework that emphasises the development of specific competencies, or sets of knowledge, skills, and abilities as the basis for learning and assessment (Zlatkin-Troitschanskaia, 2021). It is focused on the practical application of knowledge, rather than simply memorising information, and encourages learners to develop a deep understanding of the subject matter and how it can be applied in real-world contexts (Bergsmann et al., 2015). Competency-based learning theory emphasises the acquisition of skills and knowledge that enable learners to perform specific tasks or meet predetermined objectives. This approach emphasizes the mastery of competencies, which are defined as “context-specific dispositions which are acquired, and which are needed to cope successfully with domain-specific situations and tasks” (Blömeke et al., 2013, p. 3). This theory has significant implications for the development of AI language models, as it highlights the importance of practical application and context-specific knowledge (Booth et al., 2022). AI language models such as ChatGPT are designed to learn and develop competencies through exposure to large amounts of data, and the ability to apply this learning in real-world scenarios is essential to their effectiveness.

By using competency-based learning theory to guide the development of AI language models, researchers and developers can ensure that these models are effective in a range of practical applications. For example, they can focus on developing models that can understand and respond to specific sets of questions or commands, rather than simply generating generic responses based on keyword matching. Furthermore, competence-based learning theory can also be used to guide the assessment and evaluation of AI language models. Rather than evaluating models based solely on their ability to generate coherent responses, assessments can be designed to evaluate specific competencies, such as the ability to infer meaning from context or utilise domain-specific knowledge (Bergsmann et al., 2015).

As an AI language model, ChatGPT can be assessed using competency-based learning theory. The first step in this process is the definition of the competencies that ChatGPT should demonstrate mastery of. These competencies might include the ability to understand and respond appropriately to a wide range of user queries, the ability to generate coherent and contextually appropriate responses, and the ability to learn from user feedback and improve its performance over time. Once the competencies have been defined, a response is generated from ChatGPT after which ChatGPT's performance can be assessed. The analysis focuses on its responses to user queries, evaluation of its ability to generate coherent and relevant responses, and feedback from users on the usefulness and accuracy of its responses. Assessment in a competency-based learning approach is ongoing and iterative, with learners receiving feedback on their progress and areas where they need to improve. Similarly, as ChatGPT is an AI language model, its assessment response would also be ongoing, as such the initial process will be repeated as a feedback loop to improve upon the initial assessment. The process is presented in the figure below.

Figure 1 above depicts the four steps of the framework, starting with defining the competency framework and ending with iterative improvement. The arrows between the steps indicate that each step informs and is informed by the others, emphasising the iterative and continuous nature of the process. The figure also includes a feedback loop, which highlights the importance of incorporating feedback into the analysis and evaluation process to continuously improve performance. For the purposes of this study, feedback was provided on ChatGPT’s response but the feedback was not looped to define new competencies. The question of whether ChatGPT could apply newly defined competencies to generate more effective responses is the subject of another study.

In this article, we evaluated ChatGPT's competence to answer five university assessment tasks. We adopted a multidisciplinary team-based approach to gauge the competency of the emerging generative IA system, drawing on content analysis and peer debriefing strategies. To evaluate ChatGPT's task response competence, a multidisciplinary team of academics was assembled from five Australian higher education institutions, consisting of experts in Education, Management, Law, Social Work and Accounting. Each disciplinary team comprised a minimum of two members thus totalling five teams with eleven members. This multidisciplinary team-based approach was chosen to ensure that the assessments cover a wide range of disciplines, reflect multiple perspectives, and have different levels of complexity in order to offer deep insight into ChatGPT's task response competency. Each team chose one assessment task in their discipline area for either a first-year or final-year group and developed a rubric and scaffold (clearly defined competencies) that aligned with the chosen task. One member from each team fed the task into the ChatGPT chatbox to generate a response. To produce the best possible response, each element of the clearly defined competencies was fed into the chatbox separately, one at a time. Responses were then collated and marked against the rubric.

Each member of a disciplinary team separately marked the ChatGPT response against its associated rubric, assigned scores for each criterion in the rubric, and wrote a reflection on the competency of the ChatGPT response. In marking each response, a content analysis method was employed to ensure a holistic coverage of the response and to account for all its subtleties. To achieve inter-rater reliability, team members discussed their individual assessment of the ChatGPT response and reviewed each other’s marks and reflections on the question. Each team then produced a short team reflective piece, accounting for the commonalities and differences in their individual reflections, for cross-disciplinary comparison and analysis (see appendix for the disciplinary group reflections). We present the findings along three themes: language proficiency, reasoning ability, and structuring and relevance of response.

To provide a comprehensive evaluation of ChatGPT's language processing abilities and shed light on the underlying mechanisms that contribute to its performance, this section focuses on conducting a competence-based analysis of ChatGPT's task response, which encompasses three key areas: language proficiency, reasoning ability, and structuring and relevance of response. Firstly, we evaluated ChatGPT's language proficiency by analysing its ability to understand and respond to different types of language inputs, including natural language, formal language, and technical jargon. We also examine the accuracy and coherence of its responses in relation to the context and intent of the input. Secondly, we assess ChatGPT's reasoning ability, which involves its capacity to process and analyse information and make logical deductions and inferences. This includes evaluating its ability to understand complex concepts, identify patterns, and draw conclusions based on the available information. Finally, we examine the structuring that underlie ChatGPT's task response as well as the relevance of the response. In the context of learning competencies, attention is crucial for attending to relevant information and filtering out distractions. We will explore how ChatGPT processes and utilises information to generate responses.

Language Proficiency

In the context of CBL, language learning is about the definition of specific skills and abilities learners should acquire through the completion of assessments. This allows learners to focus on the key skills they need to develop most rather than covering a broad range of language knowledge (Henri et al., 2017). In view of this, we assessed the language proficiency of ChatGPT. ChatGPT was found to produce fluent and clear sentences, with clear headings and responded well when asked direct questions that did not require depth of interpretation. Responses were able to use relevant language and for education, referred to specific standard elements and methods that were relevant to that question’s research plan.

Generally, the production of syntax, grammar and spelling were at an acceptable level for submission at undergraduate and postgraduate levels, though both Education and Law found that the statements and prose lacked substance, creativity and nuance. This can be a source of concern, as it has been seen in Social Work that the writing style of some students suddenly shifts from clunky and error-laden English to the clear syntax that ChatGPT produces, which may conceal the actual writing ability of students and their development of English fluency. One important goal of CBL is its ability to accommodate individual differences in learning (Evans et al., 2015). This goal can be achieved through strategies such as individualised learning, autonomy, continuous learning and control of their own education, something that can be concealed when students overly rely on such chatbots. Again, given that the primary purpose of CBL is to improve student outcomes, we find this as problematic and raises questions about how it can help improve student achievement. Improving students’ achievement requires breaking down desirable skill sets into discrete competencies that can build on each other (Brumm et al., 2006a; Brumm, Mickelson et al., 2006), and such will require getting a clear picture of the actual competencies of students.

The Accounting group showed that ChatGPT was able to complete basic ratios, correctly calculating 3 out of 4 ratios, which would have resulted in a score of 8 out of 10. When moving onto more sophisticated questions, such as undertaking acquisition analysis, passing consolidation journal entries with narrations and determining non-controlling interest under group Accounting, additional information that is pertinent to answer such questions was lacking. Interestingly, when calculating for understating assets to determine the non-controlling interests of a firm, the AI could not do this action due to an ethical breach in financial reporting. While this is a useful note of caution for students to be mindful of, there are real-world applications of this task that accountants must do in order to, for example, prepare firms for financial audits. In this way, ChatGPT seems unable to discern when an ostensibly ‘unethical’ action may serve a wider purpose, and cannot contextualise such actions when needed. This again raises questions about how students can build their competencies within this context. As highlighted in literature (Brumm, Mickelson et al., 2006; Henri et al., 2017), the CBL shifts from the traditional time-constrained system to a knowledge-based system where students are expected to progress at their own pace while mastering what is expected.

Critical Thinking and Reasoning

Industry demands more competent and qualified employees equipped with critical and analytical competencies. Research shows that meeting these goals will require strategies such as CBL (Henri et al., 2017). This is particularly important as the number and types of competencies required of graduates is constantly changing (Sutcliffe et al., 2005). One of the essential skills required of today’s graduates is critical thinking which is the ability of individuals to make judgement clearly and rationally through the processing, engagement and evaluation of information. Given that critical judgements are based on many approaches and sources which include what the individual has learnt, known, understood, examined, experienced, seen or even heard, we examined the responses of ChatGPT to assess the ability to make judgements based on highly relevant examples and overall depth in a response/answer.

For all disciplines, ChatGPT was unable to understand context and generate answers that were meaningfully engaged with appropriate case studies. For management, particular organisations were not able to be examined. For social work and law, particular laws and regulations that social workers must abide by within an Australian context or the relevant laws within particular contexts were not understood or mentioned in any way. In Management, as a result of this lack of contextual understanding, the SWOT analysis required for the said organisation did not take place. Within Social Work, particular services that may help characters within case studies were very generalised, and lacked any nuance that is required to meet specific and complex needs that service users often require. In both Management and Social Work, a lack of specific and necessary interventions were proposed, thereby rendering such results unpassable. Education also found a complete lack of nuance when responding to socio-cultural realities that were asked of in the research question for their assessment task.

One of the most exposing elements of such responses came when requiring ChatGPT to generate critical reflections or responses to case studies, usually formulating complex circumstances that require discernment and sound judgment to answer reasonably. For Management, this meant that there was a dearth of examples of key practices and interventions that would emulate best practice principles. For Social Work, critical reflection revolves around what a social worker ought to do in practice, rather than what the student actually did in, for example, a role play with another student, where students are enacting a typical interaction with a client. Within Social Work practice, critical reflection is a cornerstone of developing professional competence, and it was clear that ChatGPT had no idea of a) the scenario asked to critically reflect upon, and b) the subjectivities the writer needs to draw upon in order to answer such questions satisfactorily. Education also found the lack of context a prevailing reality for assessment questions, which reduced the overall response quality, and does little to assist students in being able to apply their skills and knowledge to diverse contexts. ChatPGT seems competent to adhere to basic commands, such as ‘discuss’, ‘evaluate’, ‘explain’, though it seems unable to compare, evaluate, analyse and generate references in these higher-order skills. Where a directive appeared in an assessment question, the AI bot seemed to focus on that, to the exclusion of other details or directives that may form a subsequent part of the question. As such, the need for human thinking and more sophisticated thought seems to be beyond the bot’s ability, at least for now.

One key limitation is the flow of ideas. In many parts, the write-up or responses started with one idea and quickly jumped to another idea or point, without exhaustively discussing or explaining the previous point. This eventually led to a lack of critical analysis or a discussion that was not relevant to the context.

Structuring and Relevance of response

One of the key advantages associated with CBL is assessment structuring. In CBL, instructors typically provide clear instructions about the competencies to be learned and assessed which helps direct the design and structuring of learning materials and assessments (Baughman et al., 2012). This means students and instructors will have a clear roadmap throughout the unit (Evans et al., 2015; Di Trapani & Clarke, 2012). As a result of this we assessed the structuring and relevance of the responses generated.

While ChatGPT was able to adhere to generic structures required by some questions, there was a tendency to repeat sections of the response, and an overall lack of continuity between one part and the next. This may have been influenced by the fact that the separate parts of the scaffold were entered into the AI machine one at a time, though it was noticed that when an entire question was entered, this same tendency appeared. This lack of synthesis and development of argument made responses poorer overall and did not show higher levels of sophisticated thinking required particularly when conducting research of any kind. It seems that the responses provided by ChatGPT are a good starting place for students, whether they are developing research questions, surveying the general trends or attitudes within a particular area of practice, or seeking broad information on a topic. There is a clear need for relevant and contemporary referencing to be provided, as well as more nuanced and critical thinking that demonstrates the capacity of the human mind to make connections, link subtleties, and present cohesive and convincing arguments in ways that the AI bot cannot.

When asked to produce deadlines for an Education research project, ChatGPT ambitiously stated that it would complete 300 parent interviews in both English and Spanish (a nod to the America-centric nature of the app) within four weeks, which seemed quite implausible to the research team. For the Social Work team, there appeared a rather unhelpful response: “Working with vulnerable populations or on sensitive issues may have led to feelings of empathy and compassion, while working with challenging individuals may have resulted in frustration or burnout”. Apart from the naivety of such a statement, this reductionist approach to the emotional responses of social workers shows no meaningful reflection or subjective response to what a social work student may be experiencing when working with such service users, which is precisely what the question was asking. This statement seems to imply that such reactions are automatically aroused simply because a people group are vulnerable, or challenging individuals similarly lead to feelings of frustration or burnout among social workers. This emotional oversimplification may be something to look out for in future assessments or AI generated manuscripts, highlighting a lack of nuance and personal and subjective experiences that our diverse student cohorts may experience.

Across all discipline groups, there appeared to be a dearth of relevant and contemporary literature produced by ChatGPT, and such references when searched for by the research team often did not exist. It appears that when entering a generic assessment question without the specific instruction of how many sources are required, it is the norm for ChatGPT to provide no references at all. This may offer students an unhelpful reversal of completing assessments - have the AI bot make statements, which students then need to source to fulfil the referencing requirements for their assessments. This effectively means that students may not complete any meaningful literature searches at all to complete an assessment, subject or course, and are simply finding isolated statements that match what ChatGPT has produced. Again, while references were provided in some cases, none of them was correct, which is an indication that students will have to verify every piece of information provided by the chatbot if they want to use them.

While the initial ChatGPT-produced responses were not properly structured across all the disciplines, the second round of responses were better structured because the questions were scaffolded. Scaffolding is considered one of the best strategies in CBL where students are provided with directions to reduce the complexity of the task (Belland et al., 2015). From our findings above, such strategies will make it much easier for ChatGPT as the response will be better. As stated in literature, such scaffolding should be complemented with other strategies such as fieldwork or experiential learning where students will be required to apply knowledge from the written assessments in real-world situations (Bensah et al., 2011; Evans et al., 2015).

Implications

There are several implications for the education sector with the emergence of Open AI’s Chat GPT. Despite the weaknesses and limits to the responses generated by ChatGPT, it is still far more capable than its predecessors in its potential threat to academic integrity. Students can use ChatGPT to plan to answer questions which require descriptions and outlines of specific content (Yorio, 2023). ChatGPT can formulate responses that fit within specific response types and writing styles within minutes (Dowling & Lucey, 2023). Furthermore, the AI can be used by students to help plan and locate information for specific assessment tasks, both formal and informal. Based on the marked responses of all the groups, its strengths lay with structure, language conventions and the locating and summarising relevant information, at least at a surface level. Due to the widespread media attention associated with ChatGPT, the number of users will inevitably increase, this will impact on collective knowledge regarding ChatGPT and will eventuate in greater ease of use due to the growing collective skill set and overall knowledge of the software (Joo et al., 2018).

Everyone involved in the higher education sector must eventually adapt to consider the use of ChatGPT and adjust assessment methodologies accordingly. It may also prompt some long overdue revisions of assessment in the education sector. If the core reason for assessment is considered at the design stage, educators can account for the use of ChatGPT and eliminate risks (Dowling & Lucey, 2023). For example, if the assessment is formative in nature and is designed to navigate students through a process of learning new content or skills, the design could account for the use of ChatGPT by including mandatory interaction between the educator and the student. This would mean a continuous and fluid demonstration of learned skills or content. This does not necessarily mean assessment needs to be solely conducted via interviews or in-class tests, it simply suggests that educators need to be aware of students’ progress, and the tools they have available to them, including ChatGPT, and differentiate assessment and learning processes as they see fit. Higher education institutions may also use practice-based assessment alternatives to counter the potential downsides of academic cheating and misconduct associated with the use of ChatGPT. Further, encouraging students who use the AI system to acknowledge its contribution to their assignments could improve academic openness and integrity.

The limitations of this study provide pointers for future research. First, this study has not investigated the factors that determine the use of the applications like ChatGPT by students to understand the motivating factors from the students’ perspective. Forthcoming studies can apply relevant theories such as the Technology Acceptance Model to investigate the antecedents of ChatGPT use among students. This information will be useful for educational institutions and policy makers in understanding the motivation for ChatGPT use and informing decisions on assessment design. Second, this study has not undertaken comparison across different groups of people or even contexts, to give an understanding of the extent to which contextual factors influence the use of ChatGPT. Future studies should consider comparing the use of ChatGPT across groups or contexts to provide a more comprehensive understanding of ChatGPT use and the impact of contextual factors. Third, this study has not applied a longitudinal approach, especially given that ChatGPT is relatively new. Upcoming studies can apply the longitudinal approach to investigate the use of applications such as ChatGPT. This will be helpful in highlighting users who continue to use these applications and those who stop after a while to provide an insight into the reasons for stopping and continuing. Fourth, the subject disciplines across which the use of ChatGPT are assessed in this study are limited to Law, Accounting, Social Work, Management and Education. Future studies can consider extending the study to more disciplines to provide a broader analysis of the application of ChatGPT and enable comparison across disciplines to identify any contextual factors that are impacting the use. Fifth, our study did not obtain the perspectives of people who have used ChatGPT to generate and submit assignments, so we are unable to provide an. Forthcoming studies should consider collecting data from participants who have used ChatGPT for writing essays or assessments to generate insight into the experience of using the chatbot.

This study examines the competence and utility of ChatGPT in answering university-level questions across a variety of academic disciplines, using assessment questions in Accounting, Social Work, Law, Management, and Education. Generally, ChaptGPT used some plausible writing approach that could be a scaffold for beginners. Sentences and grammar were clear and easy to read, but in some cases similarly sterile, elementary, and unengaging (probably because the assessors were experienced academics). However, perhaps the greatest weakness in the response was that it seemed incapable of producing sophisticated, contextually relevant arguments and could not analyse, compare, evaluate and acknowledge research in a plausible and engaging manner. We therefore conclude that rather than see ChatGPT as a solution to all problems in academic writing, it should rather be seen as a tool that could be used to generate points to stimulate further and deeper thought. When used for research-focused tasks, human intelligence must be utilised to review the output for accuracy, consistency and feasibility. As artificial intelligence is likely to be an integral part of academic life, with massive growth expected in the future, higher education institutions must continue to be innovative with their assessment policies and strategies in the face of the seeming reliance on and widespread use of artificial intelligence.

Competing interests: The authors declare no competing interests.

Ault, A. (2023, January 26). AI Bot ChatGPT passes US medical licensing exams without cramming – unlike students. Medscape. https://www.medscape.com/viewarticle/987549#:~:text=AI%20Bot%20ChatGPT%20 Passes%20US%20Medical%20Licensing%20Exams%20Without%20Cramming%20%20Unlike%20Students&text = ChatGPT%20can%20pass%20parts%20of,help%20st udents%20prepare%20for%20it.
Baughman, J. A., Brumm, T. J., & Mickelson, S. K. (2012). Student professional development: Competency-based learning and assessment. The Journal of Technology Studies, 38(2), 115–126. https://doi.org/10.21061/jots.v38i2.a.6
Belland, B. R., Walker, A. E., Olsen, M. W., & Leary, H. (2015). A pilot meta-analysis of computer-based scaffolding in STEM education. Journal of Educational Technology and Society, 18(1), 183–197.
Bensah, E. C., Ahiekpor, J. C., & Boateng, C. D. (2011). Migrating from subject-based to competency-based training in Higher National Diploma Chemical Engineering: The case of Kumasi Polytechnic. Education for Chemical Engineers, 6(3), 71–82. https://doi.org/10.1/016/j.ece.2011.04.001
Bergsmann, E., Schultes, M. T., Winter, P., Schober, B., & Spiel, C. (2015). Evaluation of competence-based teaching in higher education: From theory to practice. Evaluation and program planning, 52, 1–9. https://doi.org/10.1016/j.evalprogplan.2015.03.001
Blömeke, S., Zlatkin-Troitschanskaia, O., Kuhn, C., & Fege, J. (Eds.). (2013). Modeling and measuring competencies in higher education: Tasks and challenges. Sense Publishers.
Booth, G. J., Ross, B., Cronin, W. A., McElrath, A., Cyr, K. L., Hodgson, J. A., … Jardine, D. (2022). competency-based assessments: leveraging artificial intelligence to predict subcompetency content. Academic Medicine, 98(4), 497–504. https://doi.org/10.1097/ACM.0000000000005115
Brumm, T., Hanneman, L. F., & Mickelson, S. K. (2006). Assessing and developing program outcomes through workplace competencies. International Journal of Engineering Education, 22(1), 123–129.
Brumm, T., Mickelson, S. K., Steward, B. L., & Kaleita, A. L. (2006). Competency-based outcomes assessment for agricultural engineering programs. International Journal of Engineering Education, 22(6), 1163–1172.
Chaudhry, M. A., & Kazim, E. (2022). Artificial intelligence in education (AIEd): A high- level academic and industry note 2021. AI and Ethics 2, 157–165. https://doi.org/10.1007/s43681-021-00074-z
Dehouche, N. (2021) Plagiarism in the age of massive Generative Pre-trained Transformers (GPT-3). Ethics in Science and Environmental Politics, 21, 17–33. https://doi.org/10.3354/esep00195
Di Trapani, G., & Clarke, F. (2012). Biotechniques laboratory: An enabling course in the biological sciences. Biochemistry and Molecular Biology Education, 40(1), 29–36.
http://dx.doi.org/10.1002/bmb.20573
Dowling, M., & Lucey, B. (2023). ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters, 53. https://doi.org/10.1016/j.frl.2023.103662
Elkins, K., & Chun, J. (2020). Can gpt-3 pass a writer’s turing test? Journal of Cultural Analytics, 5(2). https://doi.org/10.22148/001c.17212
García-Peñalvo, F. J. (2023). The perception of artificial intelligence in educational contexts after the launch of ChatGPT: Disruption or Panic? https://doi.org/10.14201/eks.31279
Gozalo-Brizuela, R., & Garrido-Merchan, E. C. (2023). ChatGPT is not all you need. A state of the art review of large generative AI models. arXiv preprint arXiv:2301.04655. https://doi.org/10.48550/arXiv.2301.04655
Grawemeyer, B., Gutierrez-Santos, S., Holmes, W., Mavrikis, M., Rummel, N., Mazziotti, C., & Janning, R. (2015). Talk, tutor, explore, learn: Intelligent tutoring and exploration for robust learning. In Proceedings of the 17th International Conference on Artificial Intelligence in Education (AIED).
Henri, M., Johnson, M. D., & Nepal, B. (2017). A review of competency-based learning: Tools, assessments, and recommendations. Journal of engineering education, 106(4), 607–638. https://doi.org/10.1002/jee.20180
Kepanen, P., Määttä, K., Uusiautti, S. (2019). How do students describe their study processes in the competence-based vocational special education teacher training? Hu Arenas, 2(3), 247–263. https://doi.org/10.1007/s42087-019-00080-y
Lieberman, M. (2023, January 4). What is ChatGPT and how is it used in education? Education week. https://www.edweek.org/technology/what-is-chatgpt-and-how-is-it- used-in-education/2023/01
Lucey, B., & Dowling, M. (2023, January 27). ChatGPT: Our study shows AI can produce academic papers good enough for journals - just as some ban it. The Conversation. https://theconversation.com/chatgpt-our-study-shows-ai-can-produce-academic- papers-good-enough-for-journals-just-as-some-ban-it- 197762#:~:text = Several%20researchers%20have%20already%20listed,the%20progra m%20in%20submitted%20papers.
Luckin, R. (2017). Towards artificial intelligence-based assessment systems. Nature Human Behaviour, 1(3). https://doi.org/10.1038/s41562-016-0028
Ma, W., Adesope, O. O., Nesbit, J. C., & Liu, Q. (2014). Intelligent tutoring systems and learning outcomes: A meta-analysis. Journal of educational psychology, 106(4), 901–918. http://dx.doi.org/10.1037/a0037123
Makridakis, S. (2017). The forthcoming Artificial Intelligence (AI) revolution: Its impact on society and firms. Futures, 90, 46–60. https://doi.org/10.1016/j.futures.2017.03.006
Popenici, S. A., & Kerr, S. (2017). Exploring the impact of artificial intelligence on teaching and learning in higher education. Research and Practice in Technology Enhanced Learning, 12(22), 1–13. https://doi.org/10.1186/s41039-017-0062-8
Samarakou, M., Fylladitakis, E., Prentakis, P., & Athineos, S. (2014). Implementation of artificial intelligence assessment in engineering laboratory education. International Conference e-Learning. https://fles.eric.ed.gov/fulltext/ED557 263.pdf.
Selwood, I., & Pilkington, R. (2005). Teacher workload: using ICT to release time to teach. Educational Review, 57(2), 163–174. https://doi.org/10.1080/0013191042000308341
Shankland, S. (2023, February 19). Why we are obsessed with the mind-blowing ChatGPT AI chatbox. CNET. https://www.cnet.com/tech/computing/why-were-all-obsessed-with- the-mind-blowing-chatgpt-ai-chatbot/
Stanger-Hall, K. F. (2012). Multiple-choice exams: an obstacle for higher-level thinking in introductory science classes. CBE—Life Sciences Education, 11(3), 294–306. https://doi.org/10.1187/cbe.11-11-0100
Steenbergen-Hu, S., & Cooper, H. (2013). A meta-analysis of the effectiveness of intelligent tutoring systems on K–12 students’ mathematical learning. Journal of educational psychology, 106(2), 331–347. http://dx.doi.org/10.1037/a0034752.supp
Susnjak, T. (2022). ChatGPT: The End of Online Exam Integrity? arXiv. https://doi.org/10.48550/arXiv.2212.09292.
Sutcliffe, N., Chan, S. S., & Nakayama, M. (2005). A competency based MSIS curriculum. Journal of Information Systems Education, 16(3), 301–310.
Thompson, N. & Svenja S. (2018) The decline of computers as a general-purpose technology: why deep learning and the end of Moore’s law are fragmenting computing. SSRN. https://dx.doi.org/10.2139/ssrn.3287769
Tisseron, S., Tordo, F., & Baddoura, R. (2015). Testing empathy with robots: a model in four dimensions and sixteen items. International Journal of Social Robotics, 7, 97–102. https://doi.org/10.1007/s12369-014-0268-5
Utterberg, M., Tallvid, M., Lundin, J., & Lindström, B. (2021). Intelligent tutoring systems: Why teachers abandoned a technology aimed at automating teaching processes. http://hdl.handle.net/10125/70798
Varanasi, L. (2023, 28 March). ChatGPT is on its way to becoming a virtual doctor, lawyer, and business analyst. Here's a list of advanced exams the AI bot has passed so far. Insider. https://africa.businessinsider.com/news/chatgpt-is-on-its-way-to-becoming-a- virtual-doctor-lawyer-and-business-analyst-heres/xxgs1nh
Watters, D. J., & Watters, J. J. (2007). Approaches to learning by students in the biological sciences: Implications for teaching. International Journal of Science Education, 29(1), 19–43. https://doi.org/10.1080/09500690600621282
Whisenhunt, B. L., Cathey, C. L., Hudson, D. L., & Needy, L. M. (2022). Maximizing learning while minimizing cheating: New evidence and advice for online multiple- choice exams. Scholarship of Teaching and Learning in Psychology, 8(2), 140–153. https://psycnet.apa.org/doi/10.1037/stl0000242
Yorio, K. (2023, January 17). School librarians explore possibilities of ChatGPT. https://www.schoollibraryjournal.com/story/School-Librarians-Explore-Possibilities- of-ChatGPT
Zlatkin-Troitschanskaia, O. (2021). Advances and perspectives of competence research in higher education – Report on the German KoKoHs program. International Journal of Chinese Education, 10(1). https://doi.org/10.1177/22125868211006205

Appendix.docx

Download PDF

Version 1

posted

You are reading this older preprint version

Read the latest preprint version →

Generative AI and Higher Education Assessments: A Competency-Based Analysis

Status:

Version 1

Abstract

Figures

Introduction

Literature Review

Artificial Intelligence and higher education

Revolutionising Assessment

Intelligent tutoring system (ITS)

Contextualised learning for Students

Reducing academic workload

AI capabilities in Higher education: Theoretical framework

Methodology

Analysis and Discussion

Language Proficiency

Critical Thinking and Reasoning

Structuring and Relevance of response

Implications

Conclusion

Declarations

References

Supplementary Files

Status:

Version 1