Computer or Human: A Comparative Study of Automated Evaluation Scoring and instructors’ feedback on Chinese College Students’ English Writing

doi:10.21203/rs.3.rs-1802026/v1

Download PDF

Research Article

Computer or Human: A Comparative Study of Automated Evaluation Scoring and instructors’ feedback on Chinese College Students’ English Writing

https://doi.org/10.21203/rs.3.rs-1802026/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

The role of internet technology in higher education and particularly in teaching English as a Foreign language is increasingly prominent because of the interest in the ways in which technology can be applied to support students. The automated evaluation scoring system is a typical demonstration of the application of network technology in the teaching of English writing. Many writing scoring platforms have been developed and used in China, which can provide on-line instant and corrective feedback on students’ writing. However, the validity of Aim Writing, a product developed by Microsoft Research Asia, which claims to be the best tool to facilitate Asian EFL learners, has not been tested in previous studies. In this mixed methods study, the feedback and effect of Aim Writing on college students’ writing will be investigated and compared to instructor’s feedback. The results indicate that Aim Writing’s performance is insufficient to support all students’ needs for writing and that colleges should encourage a hybrid model that contains both AES and instructor’s feedback in writing.

EFL writing

AES system

Human rater

Higher education

English essay writing, which requires integrated knowledge of linguistic and content, poses a great challenge for English as a Foreign Language (EFL) learners and teachers. Even though teachers have dedicated efforts in instructing writing, EFL learners’ performances haven’t been improved, especially in content organization, idea development, and grammar structure (Chen, 2022). EFL teachers not only need to develop students’ linguistic and communicative competence but also use relevant feedback techniques for responding to students’ writing (Alharbi, 2022). Feedback in writing with effective instruction has a positive influence on facilitating students’ writing ability. Evidence supports the positive effect on students’ engagement and revision practices through feedback from teachers (Zhang & Hyland, 2022; Link et al., 2020). Moreover, feedback from the automated evaluation system also helps language learners improve their writing proficiency, self-regulation, and self-efficacy (Naghdipour, 2022; Nückles et al,2020; Ekholm et al., 2014).

With the expansion of the class sizes in China and the emphasis on grammar drilling in College English courses, the instruction on English essay writing is overlooked. Previous studies reported that English teachers in China emphasized much more on English reading while neglecting English writing. Furthermore, the curriculum and syllabus of College English combine Reading and Writing together, and there were no specially designed English writing courses (Sang, 2017). Students are not provided channels to speak about their difficulties in writing due to limited class time and usually, they cannot receive timely feedback and correction in writing, their enthusiasm for English writing is largely decreased (Yang, 2016). Even though there is time set aside for the teaching of writing during class time, the content tends to be about how to deal with prompts in the College English Test (CET). This national, high-stakes English test in China, examines the English proficiency of undergraduate students in China and ensures that Chinese undergraduates reach the required English levels specified in the National College English Teaching Syllabus (NCETS) (Roach, 2018). Because of its importance, instructions on writing in class are limited to checking syntax and grammar (Wang, F & Wang, S, 2012).

The number of EFL learners in China has witnessed a big bloom in China because of the trend of globalization. Teaching English has been given prominence because of the large number of learners. However, the increase in student numbers in each class results in a series of problems in current EFL teaching in universities in China, such as the high demand for English teachers (Rao & Lei, 2014). Evaluating English writings is a time-consuming, challenging, and burdensome task because of English teachers’ writing proficiency and their personal beliefs and practices in providing feedback (Yu, 2021). Normally, a student’s practice writing is 200 to 250 words, and teachers are required to evaluate the writing from the similar rubric in CET in standpoint, content, structure, language use, vocabulary, and grammar. The excessive time to evaluate a large number of students’ writing easily leads to teachers’ burnout that they can hardly offer instant feedback on students’ writing (Alharbi, 2019). Teachers often give an overall score for each essay with less detailed feedback and suggestions on the vocabulary, grammar, sentence structure. Upon receiving feedback from teachers, students always feel passive to improve their writing because they remain passive throughout and there is always no requirement for redrafting (Lee, 2014).

A solution to the challenges discussed above has been the use of automatic feedback scoring. The automated essay scoring (AES) system is an online writing analysis tool, which assesses writing based on artificial intelligence according to different features like grammar, usage, mechanics, style, organization, and content. These systems were designed for writers in English-speaking countries initially and since have been adopted in English language education (Liu & Kunnan, 2015). Previous studies have tended to examine the accuracy and validity of those systems, however, few studies have focused on the effectiveness of the automotive feedback on improving language learners’ writing performance (Geng & Razali, 2020). In China, due to the large population of EFL learners, the number of faculty using AES to provide writing feedback has been increasing. The most extensively used and examined AES systems in China are iWrite and Pigai, and studies found that those systems provided controversial outcomes in evaluating writings (Jiang et al, 2020; Koltovskaia, 2020; Li et al., 2015). A recent developed AES system Aim Writing, claimed to offer proficient feedback based on a new model[1] (Ge et al., 2018) and can provide evaluation approximate to a professional English teachers’ feedback, though this has not been studied in terms of accuracy, validity, or performance in improving the outcome of writings.

In this article, the author investigates the efficacy of feedback from Aim Writing on Chinese college EFL students and compares it with the instructor’s feedback as well as students’ preferences. The goal of this research is to identify the most effective writing feedback model for Chinese EFL students.

[1]The new model contains Fluency Boost Learning and Inference algorithms. Fluency Boost Learning is a new model to improve a sentence’s fluency without changing its original meaning; thus, any sentence pair that satisfies this condition (we call it fluency boost condition) can be used as a training instance.

Response plays a critical role in learning (Vygotsky, 1978). EFL students need to know the merits and drawbacks of their writing to improve their skills. Feedback, as the responses from other sources, is significant for learners.

Teacher feedback

Feedback plays an important role in improving writing proficiency, and teacher feedback is the most common form of writing instruction (Kamberi, 2013). Previous studies on teacher feedback mainly explored the feedback focus, forms, and efficiency. Earlier studies of teacher feedback indicated that teachers mainly focused on language mistakes in students’ writing because they viewed writing as a product, and they tended to view themselves as language teachers rather than writing instructors (Zamel, 1985). Current studies focus on two types of focus of feedback: focused feedback concerning repeated grammar errors and unfocused feedback on the general errors are compared. Eslami (2014), Farrokhi & Sattarpour (2011) pointed out that teacher’s focused feedback was more effective because learners can improve their grammar by focusing on one grammar mistake at a time.

Writing instruction has changed due to the insights from research studies. EFL teachers do not only provide their own feedback on students’ writing. Teacher written feedback now tends to combine with peer feedback, writing workshops, oral conferences, video feedback, and computer-delivered feedback (Hyland & Hyland, 2006; Mathisen, 2012). However, despite the various sources emerging in providing feedback, teacher written response is still dominant in most EFL writing classes (Hyland, 2013).

Even though that teachers' written feedback has a positive effect on students' writing ability (Razali & Jupri, 2014), researchers began to evaluate the effectiveness of teacher feedback in facilitating EFL students’ writing. Early studies suggested that much written feedback was in poor quality and focused too much on the errors (Yoshida, 2008). Current studies reveal that there are discrepancies between teachers’ feedback and students’ perceptions of correction (Muliyah & Aminatum, 2020; Agbayahoun, 2016). Learners preferred teachers’ written extended comments on content and grammar (Chen, 2014).

AES feedback

AES systems with the capability of evaluating writings using computer-generated technology, have become an essential part of large-scale writing assessments since 1999 (Dikli & Bleyle, 2014). In the research area of the AES systems, scholars put their emphasis on three aspects: the impact of AES systems on students' writing proficiency (El Ebyary & Windeatt, 2010; Parra G. & Calero S., 2019), the attitude of teachers and students in regards to AES systems (Chen & Cheng, 2008), and the application of AES systems in English writing teaching (Koh, 2017).

Technology-assisted teacher feedback and technology-assisted peer feedback were coming under the spotlight (Chen, 2014; Huang, 2016). AES systems were useful tools to provide formative, diagnostic, and summative feedback so that EFL learners could self-correct and self-revise effectively (Wang, F & Wang, S., 2012). However, the accuracy and reliability of the technology-assisted tools became the focus of providing hybrid feedback. Although most of the researchers found the consistency of the scores between AES systems and human raters (Wilson & Roscoe, 2020), controversial opinions about the AES systems are increasing because it is argued that a computer software cannot rate a student’s writing as humans do (McCurry, 2010).

Researchers have surveyed teachers’ and students' attitude toward AES in the form of a questionnaire. Most of the surveys found that most of the students' attitudes towards AES were positive because AES systems can respond to the writings in time, and help students monitor their writing improvement (Zhang, 2020).

Research on AES regard to EFL reveals some limitations. First, most of the AES systems assess data from native English-speaking writers in large-scale writing assessments (Attali & Burstein, 2004). Studies on assessing non-native speakers’ writing are less (Vajjala, 2017). The present study concerning with application of AES systems in EFL writing pedagogy is insufficient. Second, many studies are conducted in order to develop the system, rather than provide instructions in writing (Qian et al., 2020). Third, researches indicate that the efficiency of the systems is unsatisfactory. There are situations that they are unable to give accurate feedback on the content and logical structure of the writings (Zhang & Cai, 2019). Therefore, students’ writing ability cannot be improved by using the system. From teachers’ perspectives, on the one hand, they believe that the system can provide timely feedback to ease the pressure on teachers; on the other hand, since the feedback from the system is based on a large corpus and cannot provide personalized comments, the combination of different forms of feedback is suggested in future writing teaching (Elola & Oskoz, 2016).

Current studies on AES systems find that most of the systems are developed by institutions and companies in the United States, and most of them are exclusive to certain institutions. As a result, many systems are inaccessible in China. In recent decades, AES systems boosted in China. Pigai and iWrite are the most extensively used two systems locally, and their performance and validity of them were studied thoroughly. Studies showed that both systems had shortcomings in scoring and providing feedback for learners (Qian et al., 2019; Yan, 2019; Wu, 2020). Aim Writing is another new AES system developed by Microsoft Research of Asia (MSRA), and it was launched in China in late 2019. However, present studies haven’t paid much attention to this system, especially in its validity and efficacy in evaluating writings of EFL students. To fill the research gap, this study aims to investigate the performance of Aim Writing under three writing practices among non-English majors in a college in China.

The comparison between teacher’s feedback and AES feedback

Recent EFL writing research on feedback looked into the efficacy of feedback forms. Scholars tried to compare the merits and drawbacks between teacher feedback, peer feedback, and AES system feedback (Niu et al., 2021). In terms of which feedback was more effective in promoting EFL learners’ writing ability, scholars began to use hybrid interventions to exert the merits of each feedback.

One focus of the comparison is to test the effectiveness of AES is the correlation between AES and human raters. Previous studies demonstrated controversial outcomes in the correlation. Though studies proved a moderate or high correlation between the score of human raters and the AES systems (Almusharraf & Alotaibi, 2022), many researchers also found the AES system and human scoring had a weak correlation (Huang, 2014).

Another aspect of the comparison is which feedback is more helpful to students’ writing. Studies show large discrepancies in the effect between two feedback types (Dikli & Bleyle, 2014), so instructions’ awareness of the various need of students should be raised.

Methodology

Context

The purpose of this research is to examine essay feedback between an automated feedback system via the computer and teacher direct feedback. The research questions that guided this study include the following.

1) What is the relationship between Aim Writing and the course instructor’s scores?

2) What are the differences between the feedback from Aim Writing and the instructors?

3) According to students, what is the preferred model of getting feedback in writing?

This research took place at one college located in an urban area on the eastern coast of China, which is a small-sized private college. The College English course was designed for four academic hours per week, lasting for sixteen weeks. According to the syllabus, the course instructor should devote two academic hours to reading and writing, and the rest to listening and speaking. There was no separate course set aside for English writing. However, the seems reasonable syllabus was encumbered by intensive grammar and text lecturing. Course instructors had to spend more than two academic hours demonstrating complex sentence analysis and translating obscure English sentences into Chinese to guarantee the maximum of students understand the reading materials, thus leaving limited time for teachers to illustrate the basic skills in writing in class. In most circumstances, the course instructor left a writing prompt for students at the end of a class and graded those essays within weeks. Scarcely did the teacher spare a proper period of time to talk about the problems in the writings. Only when the CET approached, some teachers would give abstract tips on essay structure and content. For the writing instructions, teachers usually listed the essay structure for students and had them memorize the positions of those important points like thesis, topic sentence, and conclusion sentences without vivid examples. Most students tended to seek sample writings online and memorize them.

About Aim Writing

Aim Writing is a new AES system developed by Microsoft Research of Asia (MSRA). It was launched in China in late 2019. MSRA is one of the world's leading computer infrastructure and application research institutions, which is dedicated to advancing computer science in general. It claims to offer proficient feedback based on Fluency Boost Learning and Inference algorithms, which approximates to English teachers’ feedback. Aim Writing is based on a natural language processing system, adopted the fluency boost learning and inference mechanism, the pre-training language model, and partial masking text strategy to boost the validity in fluency, accuracy in the score, and vocabulary diversity. It currently supports eight common types of English tests in China, including elementary, secondary, college entrance, College English Test Band-4 (CET-4) and Band-6, postgraduate, TOEFL, IELTS. In different test modes, the system gives feedback according to the specific scoring criteria and writing requirements of each type of test.

Participants

From a class of 30 ten students volunteered to be part of the research. According to students’ scores on the English test of the College Entrance Exam and their final scores in the previous two sessions of College English courses, their English levels were different. Three of them had a good mastery of English, five were intermediate level and two struggled at English learning. Furthermore, except for the College English course, students did not enroll in extra English courses.

This study employed a mixed methods approach using quantitative description and qualitative semi-structured interviews. All students in the class received three writing assignments distributed across one semester. Those writing prompts were designed according to the reading materials in the course textbook, New Horizon College English, which was published by Foreign Language Teaching and Research Press, China’s largest university press and the largest foreign-language publishing institution. The writing prompts were a narrative essay, a biographical narrative essay, and an argumentative essay. The rubric for grading those essays for the course instructor was adopted from the College English Test (CET).

All students were required to accomplish three writing practices, and they received the course instructors’ feedback under the rubrics. Besides, ten students who participated in the study uploaded their writings to Aim Writing to get extra online feedback after submitting their writings to the instructor. Both the scores from the course instructor and Aim Writing were collected. In addition, at the end of the semester, students who participated in the study were invited to a semi-structured interview about their opinions on the two forms of feedback.

Data Collection and Analysis

Written approval to conduct the study was obtained from the university Institutional Review Board (IRB). In order to guarantee a fair scoring throughout the semester, the scoring rubric was created based on the scoring scale of the writing section in the CET-4 before the semester began. All scores from the course instructor were collected.

Table 1. The rubric of essay writing scoring for the instructor

Score scale	standpoint	content	structure	Language use	vocabulary	grammar
13-15	Completely address both sides of the topic	Closely related to the topic	Very clear	Coherent and fluent	Excellent use of vocabulary	No mistake
10-12	Address both sides of the topic	Related to the topic	clear	fluent and sentence patterns are changeable	Good use of vocabulary	Few mistakes
7-9	address some points of the topic	Kindly related to the topic	Not clear enough	Not coherent	unitary	Many errors
4-6	Lack of opinions about the topic	Not fully expressed	Not clear	confused	simple	Too many errors
1-3	Completely lack of opinions	Not related to the topic	No paragraphs	not complete	limited	Full of errors

Participants’ use of Aim Writing data was extracted to capture their writing performance. Data included the general comments on Aim Writing and the score for each writing. Since Aim Writing uses a percentile system to score, the instructor also converted the scores into percentiles in order to compare.

Besides the data from Aim Writing, at the end of the semester, interviews were conducted and transcribed verbatim. The interview questions were categorized into three sections: 1) perceptions of the feedback from Aim Writing, 2) perceptions of the instructor’s feedback, and 3) preferred feedback content and model. All the interviews were conducted at the researcher’s office on the college campus. The researcher invited the participants to a one-on-one half an hour interview. The interview was recorded and conducted under the interview protocols given to participants in advance. Since the original interviews were conducted in Chinese, the researcher used back-translation to make sure the information from the participants was consistent in both languages.

Thirty writing samples were collected. Using the CET rubric the instructor assessed all writing samples, while the Aim Writing provided feedback automatically.

Picture 1. Feedback from Aim Writing for one student

Picture 2. The instructor’s feedback for the same student

In this study, Person correlation and paired sample test will be used to evaluate the agreement between the automated scoring system and human scores. The percentage of agreement between the automated scoring system Aim Writing and the human rater is the standard to evaluate the reliability of the automated scoring system.

The second and third research questions addressed their perceptions of feedback from both courses and their preferable feedback model. The interviews were firstly transcribed and coded with in vivo codes. After the first round of coding, four themes were abstracted from the codes. Table 2 shows the themes, categories, and some examples in codes. Each category and its corresponding codes will be analyzed by providing the original interview extracts. In order to protect the privacy of the participants, pseudonyms (S01, S02…) are used for the participants.

Table 2. Themes, categories, and codes extracted from the interviews

Themes	Categories	In vivo codes
Time efficiency	Slower	- the instructor was slower than Aim Writing
Time efficiency	Immediate	- Aim Writing’s feedback was immediate
Communicative competence	Appropriateness for the context	-Aim Writing’s suggested vocabulary doesn’t fit in the context
Feedback focus	Vocabulary suggestions	- Use an alternative word to be more accurate
	Grammar mistakes	- Instant feedback on my grammar
	Prompt-oriented	-The instructor’s feedback focused more on the ideas presented by us.
	Writing structure	- Further enrich my content based on the comment
	Organization of writing	-Encourage me to think about the logic between sentences and paragraphs
Preferred feedback model	Combination Hybrid	- Teacher’s feedback first and upload the writing to an AES system

Descriptive statistics for the scoring of the automated scoring system Aim Writing and human rater are presented in Table 3. The average score of the automated scoring system Aim Writing is 86.60, while that of the human rater is 84.87. The average score of the two measures is close, and the average score of the automated scoring system is a little bit higher than that of the human rater (Table 3).

Table 3 Descriptive Statistics

	Mean	Std. Deviation	N
Teacher	84.87	4.439	30
Web	86.60	4.709	30

Table 4. Paired sample Correlations

		N	Correlation	Sig.
Pair	Teacher & Web	30	.580	.001

With paired-samples T-test (see Table 5.), the scores rated by Aim Writing had a weak correlation with those rated by human, r= .58, p< .001. That said, the rating criterion of the Aim Writing is not consistent with the human rater.

There was a significant difference in scores rated by Aim Writing and by human, t = -2.26, df = 29, p < .05. Aim Writing tended to give hhigher points compared with the human rater.

Table 5. Paired Samples Test

		Paired Differences
					95% Confidence Interval of the Difference
		Mean	Std. Deviation	Std. Error Mean	Lower	Upper	t	df	Sig. (2-tailed)
Pair 1	Teacher-Web	-1.733	4.201	.767	-3.302	-.165	-2.260	29	.032

From the quantitative results, even the grading criteria between the human tater and Aim Writing was the same, the scores of Aim Writing’s scores were higher than the human rater. From the feedback of Aim Writing, all corrections were on grammar, vocabulary, and sentence structure, while the teachers’ feedback covered not only those aspects but also the evaluation of the content. The scores from Aim Writing were higher might be because of the different emphasis on the writing.

Teachers using the AES systems as a tool to evaluate students’ writing need to pay attention to the corrections in order to have a comprehensive understanding of the system’s bias.

Qualitative responses

Time efficiency

The timing of feedback is a controversial topic among researchers. Some believe that immediate feedback is a means to prevent errors that will be encoded into memory (Lee et al., 2013), while others argue that delayed feedback reduces proactive interference so that the correction information can be encoded with no interference by the initial error (Ravand & Rasekh, 2011). In terms of writing tasks for EFL learners, written feedback provided in a timely manner greatly influenced student learning (Basey et al., 2014). The AES system can provide immediate and continual feedback on essay content based on statistical techniques.

The participants expressed their reflections on the time they received feedback from Aim Writing and the instructor. Aim Writing could provide instant feedback after the users submit their essays on the input page. However, participants admitted that they usually received the instructor’s feedback until after a week. Several students pointed out that the time of feedback had an impact on their willingness to make revisions to their essays. For example:

I think feedback from the instructor was slower than Aim Writing. Usually, I wouldn’t want to revise my essay if I receive feedback for more than one week. I think the teacher’s feedback should be delivered within two days. (S04)

I can get immediate feedback from Aim Writing so that I know what the errors are in my own essay. It is a good experience. I know my mistakes and I can correct them right on the spot. (S06)

Communicative competence

Communicative competence is the language user’s grammatical knowledge of syntax, morphology, phonology, and the social knowledge about how and when to use utterances appropriately. Peter and Chomsky (1968) referred competence to as the “linguistic system” that the language user had internalized the perception and production of speech. Savignon (1983) proposed a communicative competence model that consisted of grammatical competence, discourse competence, socio-cultural competence, and strategic competence for guiding language learning.

Aim Writing contributed to the improvement in grammatical competence of English language learners, especially the lower-level learners. It can provide appropriate vocabulary choices for students in the context.

I feel more confident in my grammar because the grammar mistakes Aim Writing pointed out were what I usually ignored. After I paid special attention to them, my grammar was better. (S03)

Aim Writing served as an instant grammar correction tool can largely enhance students’ grammatical knowledge and make students reflect on their mistakes in order to avoid repetitive ones in the future.

The teacher’s written feedback also paid attention to this part. However, only students with higher-level English realized the teacher’s instruction in grammar.

I agree more with teacher’s feedback on the choice of words because she considered the context and encouraged me to use the words and phrases we newly learned. I can remember them after repeated practice. Aim Writing’s suggestions were useful, but the words it offered sometimes were hard, I can’t remember several days later. (S04)

Aim Writing and the teacher’s feedback in communicative competence were recognized by the participants, but the effectiveness was different according to the student’s English ability.

Feedback focus

The participants stated the differences in feedback focus between Aim Writing and the instructor. They pointed out that Aim Writing mostly presented the corrective feedback on grammatical errors including choice of words, tenses, and pronouns which was helpful not only in increasing the clarity of the essay but also in improving their self-efficacy in writing. The feedback could indicate the error as well as provide the corrected version for users to consider:

Aim Writing clearly pointed out which part was missed in the sentence and gave suggestions on adding specific words. (S02)

Aim Writing could suggest I use an alternative word to be more accurate, and when I wrote a similar sentence in another situation, I could still remember the suggested word. (S03)

Aim Writing helped me avoid low-level grammatical mistakes. (S10).

The participants pointed out the instructor’s feedback contained grammar corrections but focused more on the organization of arguments. Researchers found that Chinese teachers showed a stronger focus on correcting the use of grammar and vocabulary (Cheng et al., 2021). However, researchers tend to agree that the strategy and influence of teacher feedback are context specific.

The teacher was not able to point out every grammatical mistake in my essay but could provide suggestions on the arguments. For example, one of the feedback was adding an example to prove the thesis statement. However, Aim Writing wouldn’t tell me to add which kind of content, and the comment on the writing was abstract, that is to say, I didn’t know how to further enrich my content based on the comment. (S01)

The instructor’s feedback focused more on the ideas presented by us. The teacher usually brought out suggestions on my essay structure, posted some questions to me, and encourage me to think about the logic between sentences and paragraphs. I could talk to my teacher about my thoughts on revision. Aim Writing did not have these functions. (S04)

The instructor could circle out some grammar mistakes in my essay, but maybe due to fatigue in grading, she could not be as efficient as Aim Writing. There were cases that which she did not point out the spelling mistakes and inappropriate word use. (S07)

Previous research indicated that teachers’ feedback has an advantage over other kinds of feedback in improving students’ language proficiency in both grammar and meaning-level issue and content (Ruegg, 2015).

The participants contended instructor’s feedback was individualized, thus having more weight in promoting their writing proficiency compared with Aim Writing’s feedback. Teacher’s feedback was drawn from the evaluation of the organization, grammar, and content in accordance with the prompt so that the participants felt they could be more proficient in writing for these parts account for a larger proportion of scores in English standard tests:

The scores for my three essays were almost the same, and the comments were too. I found a problem with Aim Writing, that is, it cannot judge whether the examples I used or the arguments I stated were appropriate to answer the questions in the prompt. There is no space for me to tell the machine what question I am going to answer or what the prompt for this writing is. (S10)

One of the concerns of the participants was the correctness of the grammar errors pointed out by Aim Writing. The participants pointed out that grammar correction was the most intuitive evaluation given by the system, and that grammar correction and vocabulary suggestions are the most helpful. However, the system cannot judge according to the specific context of students' writing, two of them raised their concerns about the pseudo “grammar errors” in the use of articles and vocabulary suggestions, which misled them to revise.

Students with poor grammar knowledge will all accept the grammar advice given by Aim Writing, while those with better grammar will notice that sometimes the grammar or vocabulary advice does not fit the context:

The alternative word suggested by Aim Writing was not appropriate in the context of my essay. I looked it up in the English dictionary, and it was not right. (S08)

I think some feedback on the grammar aspect was of no use to me because I didn’t think there was any mistake in my sentences. Why would I change? I compared it with the instructor’s feedback, and the teacher didn’t mark the same sentence, so I believed I was right. (S02)

Another concern was that the summative comments at the end of the essay were homogenous and less helpful in revision. Aim Writing provided overall grading on participants’ essays and offered comments on three aspects: vocabulary, sentences, and discourse, but the language of the comments is very abstract, for example, "discourse is not in-depth, and not convinced". Most of the students could not get any authentic improvements through those words.

I found the comments in each of my essays were almost the same. For example, in the vocabulary part, the comments were “the words used in the writing are not advanced”. I know some of the words I used are very common, but how I can revise is not pointed out by Aim Writing. (S08)

Preferred feedback model

The preferred feedback model for most of the participants was the combination of the feedback from Aim Writing and the instructor. However, few preferred only teacher’ s feedback:

I would prefer the teacher’s feedback first and upload the writing to an AES system to get another version of feedback. (S09)

I think I can seek Aim Writing for help because it can point out grammar mistakes, and then the teacher can provide feedback on content and structure. (S07)

This system is of limited help to me. I would only consider the feedback from Aim Writing and compare both feedback (from Aim Writing and the instructor) in grammar and see which one makes sense to me. (S02)

Participants provided three models of feedback (Table 5).

Table 6 Feedback models

Models of feedback	Phase one	Phase two
1	Teacher feedback	AES feedback
2	AES feedback	Teacher feedback
3	Compare both feedback

Participants who preferred the first feedback model were mostly with high English proficiency. They indicated that teacher feedback was more helpful in communicative competency. After teacher feedback, they had a deeper understanding of the writing’s structure and logic, and their self-efficacy in writing was enhanced. In addition, they also contended that the AES system can provide alternative words to enrich their vocabulary variety, so they prefer to use the AES system as a polishing tool.

Most of the participants advocated that the AES system should come first because they could revise their grammar mistakes immediately, which could make their writings more fluent in meaning. After this process, teachers could give their feedback based on the grammatical corrected version. Under this situation, the teacher could allocate more time to the content and offer detailed suggestions rather than spending much time correcting grammar mistakes. In this model, students’ needs in writing can be satisfied.

Few students also suggested the third model. They prefer to wait for both feedback and compare their corrections and select the merits from both sources. This feedback model will be effective if the instructor could provide feedback within three days.

The findings suggest that feedback from Aim Writing can promote students writing proficiency, especially in grammar and vocabulary use, and intermediate and introductory English level students had improved more with the help of Aim Writing. However, Aim Writing, in most situations has a good performance in language correction, can mistake right expressions. The future AES should expand the corpus with the development of English expressions in order to improve its reliability. Furthermore, the human rater plays an important role in providing feedback to individuals. Aim Writing is focused on language level correction, while the human rater can also provide suggestions on the organization of the structure and arguments in a more individualized approach. Therefore, Aim Writing cannot wholly replace teachers’ positions. Teachers’ individualized feedback is vital to students’ improvement in writing. Last, to bring AES into fullest play and relieve teachers from labor, a hybrid model of feedback is needed. AES can evaluate the grammar and provide feedback on language preliminarily, and teachers’ feedback can focus on the organization of structure and arguments. Teachers can choose the different combinations of the process to meet students’ needs in writing instructions.

The development of the automated evaluation scoring system in China still has a long way to go. For future studies, more samples are needed to investigate the validity of AES. In addition, different English levels of samples are needed to explore which kind of feedback is more efficient and effective. Finally, this study was targeted at college-level EFL students, more levels of students can be involved in this research.

EFL

English as a foreign language

Acknowledgements

We would like to thank the students who participated in the study. We would also extend our heartfelt thanks to the editors and reviewers of the journal for the helpful comments.

Author contributions

HC designed the study, collected the data, and wrote the first draft of the paper. JP analyzed the data and revised the manuscript. All authors read and approved the final manuscript.

Funding

The study did not receive any funding. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Availability of data and materials

All the data are available upon the request of the editors and the corresponding author can provide them.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Author details

Huimei Chen is a doctoral student in Curriculum and Instruction program at Northern Arizona University, and she also works at Shanghai Normal University Tinahua College, China.

Jie Pan is a doctoral student in Curriculum and Instruction program at Northern Arizona University, and he also works at Shanghai Normal Universit Tinahua College, China.

Agbayahoun, J. P. (2016). Teacher written feedback on student writing: Teachers’ and learners’ perspectives. Theory and Practice in Language Studies, 6(10), 1895. https://doi.org/10.17507/tpls.0610.01
Alharbi, M. A. (2019). Exploring the potential of Google Doc in facilitating innovative teaching and learning practices in an EFL writing course. Innovation in Language Learning and Teaching, 14(3), 227–242. https://doi.org/10.1080/17501229.2019.1572157
Alharbi, M. A. (2022). Exploring the impact of teacher feedback modes and features on students’ text revisions in writing. Assessing Writing, 52, 100610. https://doi.org/10.1016/j.asw.2022.100610
Almusharraf, N., & Alotaibi, H. (2022). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology Knowledge and Learning. https://doi.org/10.1007/s10758-022-09592-z
Attali, Y., & Burstein, J. (2004). Automated essay scoring with e-rater® V. 2. ETS Research Report Series, 2004(2), i–21. https://doi.org/10.1002/j.2333-8504.2004.tb01972.x
Basey, J. M., Maines, A., & Francis, C. (2014). Time efficiency, written feedback, and student achievement in inquiry-oriented biology labs. International Journal for the Scholarship of Teaching and Learning, 8(2), https://doi.org/10.20429/ijsotl.2014.080215
Chen, A. H. (2022). The effects of writing strategy instruction on EFL learners’ writing development. English Language Teaching, 15(3), 29. https://doi.org/10.5539/elt.v15n3p29
Chen, C. F. E., & Cheng, W. Y. E. (2008). Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Technology, 12(2), 94–112
Chen, T. (2014). Technology-supported peer feedback in ESL/EFL writing classes: a research synthesis. Computer Assisted Language Learning, 29(2), 365–397. https://doi.org/10.1080/09588221.2014.960942
Cheng, X., Zhang, L. J., & Yan, Q. (2021). Exploring teacher written feedback in EFL writing classrooms: Beliefs and practices in interaction. Language Teaching Research, 136216882110576. https://doi.org/10.1177/13621688211057665
Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1–17. https://doi.org/10.1016/j.asw.2014.03.006
Ekholm, E., Zumbrunn, S., & Conklin, S. (2014). The relation of college student self-efficacy toward writing and writing self-regulation aptitude: writing feedback perceptions as a mediating variable. Teaching in Higher Education, 20(2), 197–207. https://doi.org/10.1080/13562517.2014.974026
El Ebyary, K., & Windeatt, S. (2010). The impact of computer-based feedback on students’ written work. International Journal of English Studies, 10(2), 121. https://doi.org/10.6018/ijes/2010/2/119231
Elola, I., & Oskoz, A. (2016). Supporting second language writing using multimodal feedback. Foreign Language Annals, 49(1), 58–74. https://doi.org/10.1111/flan.12183
Eslami, E. (2014). The effects of direct and indirect corrective feedback techniques on EFL students’ writing. Procedia - Social and Behavioral Sciences, 98, 445–452. https://doi.org/10.1016/j.sbspro.2014.03.438
Farrokhi, F., & Sattarpour, S. (2011). The effects of focused and unfocused written corrective feedback on grammatical accuracy of iranian EFL learners. Theory and Practice in Language Studies, 1(12), https://doi.org/10.4304/tpls.1.12.1797-1803
Geng, J. X., & Razali, A. B. (2020). Tapping the potential of pigai Automated Writing Evaluation (AWE) program to give feedback on EFL writing. Universal Journal of Educational Research, 8(12B), 8334–8343. https://doi.org/10.13189/ujer.2020.082638
Huang, H. Y. C. (2016). Students and the Teacher’s perceptions on incorporating the blog task and peer feedback into EFL writing classes through blogs. English Language Teaching, 9(11), 38. https://doi.org/10.5539/elt.v9n11p38
Huang, S. J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11(1), 149–164
Hyland, K. (2013). Faculty feedback: Perceptions and practices in L2 disciplinary writing. Journal of Second Language Writing, 22(3), 240–253. https://doi.org/10.1016/j.jslw.2013.03.003
Hyland, K., & Hyland, F. (2006). Feedback on second language students’ writing. Language Teaching, 39(2), 83–101. https://doi.org/10.1017/s0261444806003399
Jiang, L., Yu, S., & Wang, C. (2020). Second language writing instructors’ feedback practice in response to automated writing evaluation: A sociocultural perspective. System, 93, 102302. https://doi.org/10.1016/j.system.2020.102302
Jingxin, G., & Razali, A. B. (2020). Tapping the Potential of Pigai Automated Writing Evaluation (AWE) Program to Give Feedback on EFL Writing. Universal Journal of Educational Research, 8(12B), 8334–8343. https://doi.org/10.13189/ujer.2020.082638
Kamberi, L. (2013). The Significance of Teacher Feedback in EFL Writing for Tertiary Level Foreign Language Learners. Procedia - Social and Behavioral Sciences, 70, 1686–1690. https://doi.org/10.1016/j.sbspro.2013.01.241
Koh, W. Y. (2017). Effective applications of automated writing feedback in process-based writing instruction. English Teaching, 72(3), 91–118. https://doi.org/10.15858/engtea.72.3.201709.91
Koltovskaia, S. (2020). Student engagement with automated written corrective feedback (AWCF) provided by Grammarly: A multiple case study. Assessing Writing, 44, 100450. https://doi.org/10.1016/j.asw.2020.100450
Lee, C., Cheung, W. K. W., Wong, K. C. K., & Lee, F. S. L. (2013). Immediate web-based essay critiquing system feedback and teacher follow-up feedback on young second language learners’ writings: An experimental study in a Hong Kong secondary school. Computer Assisted Language Learning, 26(1), 39–60. https://doi.org/10.1080/09588221.2011.630672
Lee, I. (2014). Revisiting Teacher Feedback in EFL Writing from Sociocultural Perspectives. TESOL Quarterly, 48(1), 201–213. https://doi.org/10.1002/tesq.153
Li, J., Link, S., & Hegelheimer, V. (2015). Rethinking the role of automated writing evaluation (AWE) feedback in ESL writing instruction. Journal of Second Language Writing, 27, 1–18. https://doi.org/10.1016/j.jslw.2014.10.004
Link, S., Mehrzad, M., & Rahimi, M. (2020). Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement. Computer Assisted Language Learning, 35(4), 605–634. https://doi.org/10.1080/09588221.2020.1743323
Liu, S., & Kunnan, A. J. (2015). Investigating the application of automated writing evaluation to Chinese undergraduate English majors: A case study of Write-To-Learn. CALICO Journal, 33(1), 71–91. https://doi.org/10.1558/cj.v33i1.26380
Mathisen, P. (2012). Video feedback in higher education – a contribution to improving the quality of written feedback. Nordic Journal of Digital Literacy, 7(2), 97–113. https://doi.org/10.18261/issn1891-943x-2012-02-02
McCurry, D. (2010). Can machine scoring deal with broad and open writing tests as well as human readers? Assessing Writing, 15(2), 118–129. https://doi.org/10.1016/j.asw.2010.04.002
Muliyah, P., Rekha, A., & Aminatun, D. (2020). Learning from mistakes: Students’ perception towards teacher’s attitude in writing correction. Lexeme: Journal of Linguistics and Applied Linguistics, 2(1), 44. https://doi.org/10.32493/ljlal.v2i1.6995
Naghdipour, B. (2022). ICT-enabled informal learning in EFL writing. Journal of Second Language Writing, 56, 100893. https://doi.org/10.1016/j.jslw.2022.100893
Niu, R., Shan, P., & You, X. (2021). Complementation of multiple sources of feedback in EFL learners’ writing. Assessing Writing, 49, 100549. https://doi.org/10.1016/j.asw.2021.100549
Nückles, M., Roelle, J., Glogger-Frey, I., Waldeyer, J., & Renkl, A. (2020). The self-regulation-view in Writing-to-Learn: Using journal writing to optimize cognitive load in self-regulated learning. Educational Psychology Review, 32(4), 1089–1126. https://doi.org/10.1007/s10648-020-09541-1
Parra, G., L., & Calero, S., X (2019). Automated writing evaluation tools in the improvement of the writing skill. International Journal of Instruction, 12(2), 209–226. https://doi.org/10.29333/iji.2019.12214a
Peter, H. W., & Chomsky, N. (1968). Aspects of the theory of syntax. The Modern Language Review, 63(1), 132. https://doi.org/10.2307/3722650
Qian, L., Yang, Y., & Zhao, Y. (2020). Syntactic complexity revisited: Sensitivity of China’s AES-generated scores to syntactic measures, effects of discourse-mode and topic. Reading and Writing, 34(3), 681–704. https://doi.org/10.1007/s11145-020-10087-5
Qian, L., Zhao, Y., & Cheng, Y. (2019). Evaluating China’s automated essay scoring system iWrite. Journal of Educational Computing Research, 58(4), 771–790. https://doi.org/10.1177/0735633119881472
Rao, Z., & Lei, C. (2014). Teaching English as a foreign language in Chinese universities: The present and future. English Today, 30(4), 40–45. https://doi.org/10.1017/s026607841400039x
Ravand, H., & Rasekh, A. E. (2011). Feedback in ESL writing: Toward an interactional approach. Journal of Language Teaching and Research, 2(5), https://doi.org/10.4304/jltr.2.5.1136-1145
Razali, R., & Jupri, R. (2014). Exploring teacher written feedback and student revisions on ESL students’ writing. IOSR Journal of Humanities and Social Science, 19(5), 63–70. https://doi.org/10.9790/0837-19556370
Roach, E. (2022, March 18). An Introduction to China’s College English Test (CET). WENR. https://wenr.wes.org/2018/08/an-introduction-to-chinas-college-english-test-cet
Ruegg, R. (2015). The relative effects of peer and teacher feedback on improvement in EFL students’ writing ability. Linguistics and Education, 29, 73–82. https://doi.org/10.1016/j.linged.2014.12.001
Sang, Y. (2017). Investigate the “Issues” in Chinese Students’ English Writing and Their “Reasons”: Revisiting the Recent Evidence in Chinese Academia. International Journal of Higher Education, 6(3), 1. https://doi.org/10.5430/ijhe.v6n3p1
Savignon, S. (1983). Communicative competence: Theory and classroom practice. MA: Addyson-Wesley
Vajjala, S. (2017). Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28(1), 79–105. https://doi.org/10.1007/s40593-017-0142-3
Vygotsky, L. S., Cole, M., John-Steiner, V., Scribner, S., & Souberman, E. (1978). Mind in society: The development of higher psychological processes. Harvard University Press. Rev. ed
Wang, F., & Wang, S. (2012). A comparative study on the influence of automated evaluation system and teacher grading on students’ English writing. Procedia Engineering, 29, 993–997. https://doi.org/10.1016/j.proeng.2012.01.077
Wilson, J., & Roscoe, R. D. (2019). Automated writing evaluation and feedback: Multiple metrics of efficacy. Journal of Educational Computing Research, 58(1), 87–125. https://doi.org/10.1177/0735633119830764
Wu, L. (2020). Application analysis of Iwrite platform in English writing teaching in higher vocational education. Journal of Physics: Conference Series, 1629(1), 012100. https://doi.org/10.1088/1742-6596/1629/1/012100
Yan, W. (2019). Functions, values and inadequacies --An evaluative discussion of Pigai intelligent online English writing correction system in view of second language acquisition. Journal of Physics: Conference Series, 1237(4), 042002. https://doi.org/10.1088/1742-6596/1237/4/042002
Yang, Y. (2016). Teaching Chinese College ESL Writing: A Genre-based approach. English Language Teaching, 9(9), 36. https://doi.org/10.5539/elt.v9n9p36
Yoshida, R. (2008). Teachers’ choice and learners’ preference of corrective feedback types. Language Awareness, 17(1), 78–93. https://doi.org/10.2167/la429.0
Yu, S. (2021). Feedback-giving practice for L2 writing teachers: Friend or foe? Journal of Second Language Writing, 52, 100798. https://doi.org/10.1016/j.jslw.2021.100798
Zamel, V. (1985). Responding to student writing. TESOL Quarterly, 19(1), 79. https://doi.org/10.2307/3586773
Zhang, W., & Cai, W. (2019). Research on English writing feedback based on online automatic evaluation and reform. Education Research, 2019(3), 102–103
Zhang, Z. V. (2020). Engaging with automated writing evaluation (AWE) feedback on L2 writing: Student perceptions and revisions. Assessing Writing, 43, 100439. https://doi.org/10.1016/j.asw.2019.100439
Zhang, Z. V., & Hyland, K. (2022). Fostering student engagement with feedback: An integrated approach. Assessing Writing, 51, 100586. https://doi.org/10.1016/j.asw.2021.100586

Pictures 1-2 are available in the Supplementary Files section.

No competing interests reported.

Picutre1.png

Feedback from <em>Aim Writing</em> for one student

Picutre2.png
The instructor’s feedback for the same student

Download PDF

Editorial decision: Major revision
13 Aug, 2022
Reviews received at journal
21 Jul, 2022
Reviews received at journal
16 Jul, 2022
Reviewers agreed at journal
10 Jul, 2022
Reviewers agreed at journal
08 Jul, 2022
Reviewers invited by journal
30 Jun, 2022
Editor assigned by journal
28 Jun, 2022
Submission checks completed at journal
28 Jun, 2022
First submitted to journal
28 Jun, 2022

You are reading this latest preprint version

Computer or Human: A Comparative Study of Automated Evaluation Scoring and instructors’ feedback on Chinese College Students’ English Writing

Status:

Version 1

Abstract

Introduction

Literature Review

Methods

Results And Discussion

Conclusion

Abbreviations

Declarations

References

Pictures

Additional Declarations

Supplementary Files

Status:

Version 1