Descriptive statistics for the scoring of the automated scoring system Aim Writing and human rater are presented in Table 3. The average score of the automated scoring system Aim Writing is 86.60, while that of the human rater is 84.87. The average score of the two measures is close, and the average score of the automated scoring system is a little bit higher than that of the human rater (Table 3).
Table 3 Descriptive Statistics
|
Mean
|
Std. Deviation
|
N
|
Teacher
|
84.87
|
4.439
|
30
|
Web
|
86.60
|
4.709
|
30
|
Table 4. Paired sample Correlations
|
N
|
Correlation
|
Sig.
|
Pair
|
Teacher & Web
|
30
|
.580
|
.001
|
With paired-samples T-test (see Table 5.), the scores rated by Aim Writing had a weak correlation with those rated by human, r= .58, p< .001. That said, the rating criterion of the Aim Writing is not consistent with the human rater.
There was a significant difference in scores rated by Aim Writing and by human, t = -2.26, df = 29, p < .05. Aim Writing tended to give hhigher points compared with the human rater.
Table 5. Paired Samples Test
|
Paired Differences
|
|
|
|
|
|
|
95% Confidence Interval of the Difference
|
|
|
|
Mean
|
Std. Deviation
|
Std. Error Mean
|
Lower
|
Upper
|
t
|
df
|
Sig. (2-tailed)
|
Pair 1
|
Teacher-Web
|
-1.733
|
4.201
|
.767
|
-3.302
|
-.165
|
-2.260
|
29
|
.032
|
From the quantitative results, even the grading criteria between the human tater and Aim Writing was the same, the scores of Aim Writing’s scores were higher than the human rater. From the feedback of Aim Writing, all corrections were on grammar, vocabulary, and sentence structure, while the teachers’ feedback covered not only those aspects but also the evaluation of the content. The scores from Aim Writing were higher might be because of the different emphasis on the writing.
Teachers using the AES systems as a tool to evaluate students’ writing need to pay attention to the corrections in order to have a comprehensive understanding of the system’s bias.
Qualitative responses
Time efficiency
The timing of feedback is a controversial topic among researchers. Some believe that immediate feedback is a means to prevent errors that will be encoded into memory (Lee et al., 2013), while others argue that delayed feedback reduces proactive interference so that the correction information can be encoded with no interference by the initial error (Ravand & Rasekh, 2011). In terms of writing tasks for EFL learners, written feedback provided in a timely manner greatly influenced student learning (Basey et al., 2014). The AES system can provide immediate and continual feedback on essay content based on statistical techniques.
The participants expressed their reflections on the time they received feedback from Aim Writing and the instructor. Aim Writing could provide instant feedback after the users submit their essays on the input page. However, participants admitted that they usually received the instructor’s feedback until after a week. Several students pointed out that the time of feedback had an impact on their willingness to make revisions to their essays. For example:
I think feedback from the instructor was slower than Aim Writing. Usually, I wouldn’t want to revise my essay if I receive feedback for more than one week. I think the teacher’s feedback should be delivered within two days. (S04)
I can get immediate feedback from Aim Writing so that I know what the errors are in my own essay. It is a good experience. I know my mistakes and I can correct them right on the spot. (S06)
Communicative competence
Communicative competence is the language user’s grammatical knowledge of syntax, morphology, phonology, and the social knowledge about how and when to use utterances appropriately. Peter and Chomsky (1968) referred competence to as the “linguistic system” that the language user had internalized the perception and production of speech. Savignon (1983) proposed a communicative competence model that consisted of grammatical competence, discourse competence, socio-cultural competence, and strategic competence for guiding language learning.
Aim Writing contributed to the improvement in grammatical competence of English language learners, especially the lower-level learners. It can provide appropriate vocabulary choices for students in the context.
I feel more confident in my grammar because the grammar mistakes Aim Writing pointed out were what I usually ignored. After I paid special attention to them, my grammar was better. (S03)
Aim Writing served as an instant grammar correction tool can largely enhance students’ grammatical knowledge and make students reflect on their mistakes in order to avoid repetitive ones in the future.
The teacher’s written feedback also paid attention to this part. However, only students with higher-level English realized the teacher’s instruction in grammar.
I agree more with teacher’s feedback on the choice of words because she considered the context and encouraged me to use the words and phrases we newly learned. I can remember them after repeated practice. Aim Writing’s suggestions were useful, but the words it offered sometimes were hard, I can’t remember several days later. (S04)
Aim Writing and the teacher’s feedback in communicative competence were recognized by the participants, but the effectiveness was different according to the student’s English ability.
Feedback focus
The participants stated the differences in feedback focus between Aim Writing and the instructor. They pointed out that Aim Writing mostly presented the corrective feedback on grammatical errors including choice of words, tenses, and pronouns which was helpful not only in increasing the clarity of the essay but also in improving their self-efficacy in writing. The feedback could indicate the error as well as provide the corrected version for users to consider:
Aim Writing clearly pointed out which part was missed in the sentence and gave suggestions on adding specific words. (S02)
Aim Writing could suggest I use an alternative word to be more accurate, and when I wrote a similar sentence in another situation, I could still remember the suggested word. (S03)
Aim Writing helped me avoid low-level grammatical mistakes. (S10).
The participants pointed out the instructor’s feedback contained grammar corrections but focused more on the organization of arguments. Researchers found that Chinese teachers showed a stronger focus on correcting the use of grammar and vocabulary (Cheng et al., 2021). However, researchers tend to agree that the strategy and influence of teacher feedback are context specific.
The teacher was not able to point out every grammatical mistake in my essay but could provide suggestions on the arguments. For example, one of the feedback was adding an example to prove the thesis statement. However, Aim Writing wouldn’t tell me to add which kind of content, and the comment on the writing was abstract, that is to say, I didn’t know how to further enrich my content based on the comment. (S01)
The instructor’s feedback focused more on the ideas presented by us. The teacher usually brought out suggestions on my essay structure, posted some questions to me, and encourage me to think about the logic between sentences and paragraphs. I could talk to my teacher about my thoughts on revision. Aim Writing did not have these functions. (S04)
The instructor could circle out some grammar mistakes in my essay, but maybe due to fatigue in grading, she could not be as efficient as Aim Writing. There were cases that which she did not point out the spelling mistakes and inappropriate word use. (S07)
Previous research indicated that teachers’ feedback has an advantage over other kinds of feedback in improving students’ language proficiency in both grammar and meaning-level issue and content (Ruegg, 2015).
The participants contended instructor’s feedback was individualized, thus having more weight in promoting their writing proficiency compared with Aim Writing’s feedback. Teacher’s feedback was drawn from the evaluation of the organization, grammar, and content in accordance with the prompt so that the participants felt they could be more proficient in writing for these parts account for a larger proportion of scores in English standard tests:
The scores for my three essays were almost the same, and the comments were too. I found a problem with Aim Writing, that is, it cannot judge whether the examples I used or the arguments I stated were appropriate to answer the questions in the prompt. There is no space for me to tell the machine what question I am going to answer or what the prompt for this writing is. (S10)
One of the concerns of the participants was the correctness of the grammar errors pointed out by Aim Writing. The participants pointed out that grammar correction was the most intuitive evaluation given by the system, and that grammar correction and vocabulary suggestions are the most helpful. However, the system cannot judge according to the specific context of students' writing, two of them raised their concerns about the pseudo “grammar errors” in the use of articles and vocabulary suggestions, which misled them to revise.
Students with poor grammar knowledge will all accept the grammar advice given by Aim Writing, while those with better grammar will notice that sometimes the grammar or vocabulary advice does not fit the context:
The alternative word suggested by Aim Writing was not appropriate in the context of my essay. I looked it up in the English dictionary, and it was not right. (S08)
I think some feedback on the grammar aspect was of no use to me because I didn’t think there was any mistake in my sentences. Why would I change? I compared it with the instructor’s feedback, and the teacher didn’t mark the same sentence, so I believed I was right. (S02)
Another concern was that the summative comments at the end of the essay were homogenous and less helpful in revision. Aim Writing provided overall grading on participants’ essays and offered comments on three aspects: vocabulary, sentences, and discourse, but the language of the comments is very abstract, for example, "discourse is not in-depth, and not convinced". Most of the students could not get any authentic improvements through those words.
I found the comments in each of my essays were almost the same. For example, in the vocabulary part, the comments were “the words used in the writing are not advanced”. I know some of the words I used are very common, but how I can revise is not pointed out by Aim Writing. (S08)
Preferred feedback model
The preferred feedback model for most of the participants was the combination of the feedback from Aim Writing and the instructor. However, few preferred only teacher’ s feedback:
I would prefer the teacher’s feedback first and upload the writing to an AES system to get another version of feedback. (S09)
I think I can seek Aim Writing for help because it can point out grammar mistakes, and then the teacher can provide feedback on content and structure. (S07)
This system is of limited help to me. I would only consider the feedback from Aim Writing and compare both feedback (from Aim Writing and the instructor) in grammar and see which one makes sense to me. (S02)
Participants provided three models of feedback (Table 5).
Table 6 Feedback models
Models of feedback
|
Phase one
|
Phase two
|
1
|
Teacher feedback
|
AES feedback
|
2
|
AES feedback
|
Teacher feedback
|
3
|
Compare both feedback
|
Participants who preferred the first feedback model were mostly with high English proficiency. They indicated that teacher feedback was more helpful in communicative competency. After teacher feedback, they had a deeper understanding of the writing’s structure and logic, and their self-efficacy in writing was enhanced. In addition, they also contended that the AES system can provide alternative words to enrich their vocabulary variety, so they prefer to use the AES system as a polishing tool.
Most of the participants advocated that the AES system should come first because they could revise their grammar mistakes immediately, which could make their writings more fluent in meaning. After this process, teachers could give their feedback based on the grammatical corrected version. Under this situation, the teacher could allocate more time to the content and offer detailed suggestions rather than spending much time correcting grammar mistakes. In this model, students’ needs in writing can be satisfied.
Few students also suggested the third model. They prefer to wait for both feedback and compare their corrections and select the merits from both sources. This feedback model will be effective if the instructor could provide feedback within three days.