To provide a comprehensive evaluation of ChatGPT's language processing abilities and shed light on the underlying mechanisms that contribute to its performance, this section focuses on conducting a competence-based analysis of ChatGPT's task response, which encompasses three key areas: language proficiency, reasoning ability, and structuring and relevance of response. Firstly, we evaluated ChatGPT's language proficiency by analysing its ability to understand and respond to different types of language inputs, including natural language, formal language, and technical jargon. We also examine the accuracy and coherence of its responses in relation to the context and intent of the input. Secondly, we assess ChatGPT's reasoning ability, which involves its capacity to process and analyse information and make logical deductions and inferences. This includes evaluating its ability to understand complex concepts, identify patterns, and draw conclusions based on the available information. Finally, we examine the structuring that underlie ChatGPT's task response as well as the relevance of the response. In the context of learning competencies, attention is crucial for attending to relevant information and filtering out distractions. We will explore how ChatGPT processes and utilises information to generate responses.
Language Proficiency
In the context of CBL, language learning is about the definition of specific skills and abilities learners should acquire through the completion of assessments. This allows learners to focus on the key skills they need to develop most rather than covering a broad range of language knowledge (Henri et al., 2017). In view of this, we assessed the language proficiency of ChatGPT. ChatGPT was found to produce fluent and clear sentences, with clear headings and responded well when asked direct questions that did not require depth of interpretation. Responses were able to use relevant language and for education, referred to specific standard elements and methods that were relevant to that question’s research plan.
Generally, the production of syntax, grammar and spelling were at an acceptable level for submission at undergraduate and postgraduate levels, though both Education and Law found that the statements and prose lacked substance, creativity and nuance. This can be a source of concern, as it has been seen in Social Work that the writing style of some students suddenly shifts from clunky and error-laden English to the clear syntax that ChatGPT produces, which may conceal the actual writing ability of students and their development of English fluency. One important goal of CBL is its ability to accommodate individual differences in learning (Evans et al., 2015). This goal can be achieved through strategies such as individualised learning, autonomy, continuous learning and control of their own education, something that can be concealed when students overly rely on such chatbots. Again, given that the primary purpose of CBL is to improve student outcomes, we find this as problematic and raises questions about how it can help improve student achievement. Improving students’ achievement requires breaking down desirable skill sets into discrete competencies that can build on each other (Brumm et al., 2006a; Brumm, Mickelson et al., 2006), and such will require getting a clear picture of the actual competencies of students.
The Accounting group showed that ChatGPT was able to complete basic ratios, correctly calculating 3 out of 4 ratios, which would have resulted in a score of 8 out of 10. When moving onto more sophisticated questions, such as undertaking acquisition analysis, passing consolidation journal entries with narrations and determining non-controlling interest under group Accounting, additional information that is pertinent to answer such questions was lacking. Interestingly, when calculating for understating assets to determine the non-controlling interests of a firm, the AI could not do this action due to an ethical breach in financial reporting. While this is a useful note of caution for students to be mindful of, there are real-world applications of this task that accountants must do in order to, for example, prepare firms for financial audits. In this way, ChatGPT seems unable to discern when an ostensibly ‘unethical’ action may serve a wider purpose, and cannot contextualise such actions when needed. This again raises questions about how students can build their competencies within this context. As highlighted in literature (Brumm, Mickelson et al., 2006; Henri et al., 2017), the CBL shifts from the traditional time-constrained system to a knowledge-based system where students are expected to progress at their own pace while mastering what is expected.
Critical Thinking and Reasoning
Industry demands more competent and qualified employees equipped with critical and analytical competencies. Research shows that meeting these goals will require strategies such as CBL (Henri et al., 2017). This is particularly important as the number and types of competencies required of graduates is constantly changing (Sutcliffe et al., 2005). One of the essential skills required of today’s graduates is critical thinking which is the ability of individuals to make judgement clearly and rationally through the processing, engagement and evaluation of information. Given that critical judgements are based on many approaches and sources which include what the individual has learnt, known, understood, examined, experienced, seen or even heard, we examined the responses of ChatGPT to assess the ability to make judgements based on highly relevant examples and overall depth in a response/answer.
For all disciplines, ChatGPT was unable to understand context and generate answers that were meaningfully engaged with appropriate case studies. For management, particular organisations were not able to be examined. For social work and law, particular laws and regulations that social workers must abide by within an Australian context or the relevant laws within particular contexts were not understood or mentioned in any way. In Management, as a result of this lack of contextual understanding, the SWOT analysis required for the said organisation did not take place. Within Social Work, particular services that may help characters within case studies were very generalised, and lacked any nuance that is required to meet specific and complex needs that service users often require. In both Management and Social Work, a lack of specific and necessary interventions were proposed, thereby rendering such results unpassable. Education also found a complete lack of nuance when responding to socio-cultural realities that were asked of in the research question for their assessment task.
One of the most exposing elements of such responses came when requiring ChatGPT to generate critical reflections or responses to case studies, usually formulating complex circumstances that require discernment and sound judgment to answer reasonably. For Management, this meant that there was a dearth of examples of key practices and interventions that would emulate best practice principles. For Social Work, critical reflection revolves around what a social worker ought to do in practice, rather than what the student actually did in, for example, a role play with another student, where students are enacting a typical interaction with a client. Within Social Work practice, critical reflection is a cornerstone of developing professional competence, and it was clear that ChatGPT had no idea of a) the scenario asked to critically reflect upon, and b) the subjectivities the writer needs to draw upon in order to answer such questions satisfactorily. Education also found the lack of context a prevailing reality for assessment questions, which reduced the overall response quality, and does little to assist students in being able to apply their skills and knowledge to diverse contexts. ChatPGT seems competent to adhere to basic commands, such as ‘discuss’, ‘evaluate’, ‘explain’, though it seems unable to compare, evaluate, analyse and generate references in these higher-order skills. Where a directive appeared in an assessment question, the AI bot seemed to focus on that, to the exclusion of other details or directives that may form a subsequent part of the question. As such, the need for human thinking and more sophisticated thought seems to be beyond the bot’s ability, at least for now.
One key limitation is the flow of ideas. In many parts, the write-up or responses started with one idea and quickly jumped to another idea or point, without exhaustively discussing or explaining the previous point. This eventually led to a lack of critical analysis or a discussion that was not relevant to the context.
Structuring and Relevance of response
One of the key advantages associated with CBL is assessment structuring. In CBL, instructors typically provide clear instructions about the competencies to be learned and assessed which helps direct the design and structuring of learning materials and assessments (Baughman et al., 2012). This means students and instructors will have a clear roadmap throughout the unit (Evans et al., 2015; Di Trapani & Clarke, 2012). As a result of this we assessed the structuring and relevance of the responses generated.
While ChatGPT was able to adhere to generic structures required by some questions, there was a tendency to repeat sections of the response, and an overall lack of continuity between one part and the next. This may have been influenced by the fact that the separate parts of the scaffold were entered into the AI machine one at a time, though it was noticed that when an entire question was entered, this same tendency appeared. This lack of synthesis and development of argument made responses poorer overall and did not show higher levels of sophisticated thinking required particularly when conducting research of any kind. It seems that the responses provided by ChatGPT are a good starting place for students, whether they are developing research questions, surveying the general trends or attitudes within a particular area of practice, or seeking broad information on a topic. There is a clear need for relevant and contemporary referencing to be provided, as well as more nuanced and critical thinking that demonstrates the capacity of the human mind to make connections, link subtleties, and present cohesive and convincing arguments in ways that the AI bot cannot.
When asked to produce deadlines for an Education research project, ChatGPT ambitiously stated that it would complete 300 parent interviews in both English and Spanish (a nod to the America-centric nature of the app) within four weeks, which seemed quite implausible to the research team. For the Social Work team, there appeared a rather unhelpful response: “Working with vulnerable populations or on sensitive issues may have led to feelings of empathy and compassion, while working with challenging individuals may have resulted in frustration or burnout”. Apart from the naivety of such a statement, this reductionist approach to the emotional responses of social workers shows no meaningful reflection or subjective response to what a social work student may be experiencing when working with such service users, which is precisely what the question was asking. This statement seems to imply that such reactions are automatically aroused simply because a people group are vulnerable, or challenging individuals similarly lead to feelings of frustration or burnout among social workers. This emotional oversimplification may be something to look out for in future assessments or AI generated manuscripts, highlighting a lack of nuance and personal and subjective experiences that our diverse student cohorts may experience.
Across all discipline groups, there appeared to be a dearth of relevant and contemporary literature produced by ChatGPT, and such references when searched for by the research team often did not exist. It appears that when entering a generic assessment question without the specific instruction of how many sources are required, it is the norm for ChatGPT to provide no references at all. This may offer students an unhelpful reversal of completing assessments - have the AI bot make statements, which students then need to source to fulfil the referencing requirements for their assessments. This effectively means that students may not complete any meaningful literature searches at all to complete an assessment, subject or course, and are simply finding isolated statements that match what ChatGPT has produced. Again, while references were provided in some cases, none of them was correct, which is an indication that students will have to verify every piece of information provided by the chatbot if they want to use them.
While the initial ChatGPT-produced responses were not properly structured across all the disciplines, the second round of responses were better structured because the questions were scaffolded. Scaffolding is considered one of the best strategies in CBL where students are provided with directions to reduce the complexity of the task (Belland et al., 2015). From our findings above, such strategies will make it much easier for ChatGPT as the response will be better. As stated in literature, such scaffolding should be complemented with other strategies such as fieldwork or experiential learning where students will be required to apply knowledge from the written assessments in real-world situations (Bensah et al., 2011; Evans et al., 2015).
Implications
There are several implications for the education sector with the emergence of Open AI’s Chat GPT. Despite the weaknesses and limits to the responses generated by ChatGPT, it is still far more capable than its predecessors in its potential threat to academic integrity. Students can use ChatGPT to plan to answer questions which require descriptions and outlines of specific content (Yorio, 2023). ChatGPT can formulate responses that fit within specific response types and writing styles within minutes (Dowling & Lucey, 2023). Furthermore, the AI can be used by students to help plan and locate information for specific assessment tasks, both formal and informal. Based on the marked responses of all the groups, its strengths lay with structure, language conventions and the locating and summarising relevant information, at least at a surface level. Due to the widespread media attention associated with ChatGPT, the number of users will inevitably increase, this will impact on collective knowledge regarding ChatGPT and will eventuate in greater ease of use due to the growing collective skill set and overall knowledge of the software (Joo et al., 2018).
Everyone involved in the higher education sector must eventually adapt to consider the use of ChatGPT and adjust assessment methodologies accordingly. It may also prompt some long overdue revisions of assessment in the education sector. If the core reason for assessment is considered at the design stage, educators can account for the use of ChatGPT and eliminate risks (Dowling & Lucey, 2023). For example, if the assessment is formative in nature and is designed to navigate students through a process of learning new content or skills, the design could account for the use of ChatGPT by including mandatory interaction between the educator and the student. This would mean a continuous and fluid demonstration of learned skills or content. This does not necessarily mean assessment needs to be solely conducted via interviews or in-class tests, it simply suggests that educators need to be aware of students’ progress, and the tools they have available to them, including ChatGPT, and differentiate assessment and learning processes as they see fit. Higher education institutions may also use practice-based assessment alternatives to counter the potential downsides of academic cheating and misconduct associated with the use of ChatGPT. Further, encouraging students who use the AI system to acknowledge its contribution to their assignments could improve academic openness and integrity.
The limitations of this study provide pointers for future research. First, this study has not investigated the factors that determine the use of the applications like ChatGPT by students to understand the motivating factors from the students’ perspective. Forthcoming studies can apply relevant theories such as the Technology Acceptance Model to investigate the antecedents of ChatGPT use among students. This information will be useful for educational institutions and policy makers in understanding the motivation for ChatGPT use and informing decisions on assessment design. Second, this study has not undertaken comparison across different groups of people or even contexts, to give an understanding of the extent to which contextual factors influence the use of ChatGPT. Future studies should consider comparing the use of ChatGPT across groups or contexts to provide a more comprehensive understanding of ChatGPT use and the impact of contextual factors. Third, this study has not applied a longitudinal approach, especially given that ChatGPT is relatively new. Upcoming studies can apply the longitudinal approach to investigate the use of applications such as ChatGPT. This will be helpful in highlighting users who continue to use these applications and those who stop after a while to provide an insight into the reasons for stopping and continuing. Fourth, the subject disciplines across which the use of ChatGPT are assessed in this study are limited to Law, Accounting, Social Work, Management and Education. Future studies can consider extending the study to more disciplines to provide a broader analysis of the application of ChatGPT and enable comparison across disciplines to identify any contextual factors that are impacting the use. Fifth, our study did not obtain the perspectives of people who have used ChatGPT to generate and submit assignments, so we are unable to provide an. Forthcoming studies should consider collecting data from participants who have used ChatGPT for writing essays or assessments to generate insight into the experience of using the chatbot.