Traditional computer index of text Ease/Difficulty
In traditional readability formulas text readability is measured on the basis of sentence and word length. Flesch reading ease (Flesch, 1948) and Flesch-Kincaid grade level (Kincaid et al., 1975) are the main traditional approaches to scaling texts that provide a single metric of text ease/difficulty. Longer sentences and words with complex syntax tend to be more challenging for the working memory. Therefore, this easy to compute approach has been known to be a good estimation of the reading time of a passage (Graesser et al., 2011). Among the studies done on the validity of this formula, Brown (1998) concluded that this formula is not a robust predictor of L2 reading difficulty, while Greenfield (1999) in his study proved otherwise. Overall, although the simplicity, unidimensionality, and practicality of the traditional approaches are appealing, they do not address deeper levels of discourse (Connor et al., 2007; McNamara & Magliano, 2009; Rapp et al., 2007).
Multilevel Frameworks
Flesch-Kincaid Grade Level or Reading Ease is accepted by the educational community; however, deeper level analysis is surely needed to assess various levels of language-discourse. Over the years, researchers and scholars have identified and formed multiple levels of comprehension and developed frameworks accordingly (e.g., Graesser et al., 1997; Kintsch, 1998; McNamara & Magliano, 2009; Pickering & Garrod, 2004; Snow, 2002). Inspecting a large body of literature on reading comprehension, Graesser et al. (2011, p. 224) identified five recurrent levels proposed in the frameworks: “(1) words, (2) syntax, (3) the explicit textbase, (4) the situation model (sometimes called the mental model), and (5) the discourse genre and rhetorical structure (the type of discourse and its composition).” Knowledge of vocabulary and familiarity with word structure have a significant effect on the amount of time spent on reading a text and comprehending it (Perfetti, 2007; Rayner et al., 2001). Graesser and McNamara (2011) then divided the analysis of word characteristics into various levels such as analysis of parts of speech, word frequency, psychological ratings, and semantic content. Syntax is also among the factors affecting the difficulty level of a reading passage. By assigning parts of speech to words, grouping them into phrases, and assigning tree structures to sentences, syntax can analyze a sentence (Jurafsky & Martin, 2008). After wording and syntax, the textbase focuses on the meaning (Kintsch, 1998). Analysis of co-reference, lexical diversity, and latent semantic are among methods used to analyze a text. For instance, in the textbase, using co-reference we connect propositions, clauses, and sentences that refer to the same person or thing (McNamara & Kintsch, 1996). The presence or lack of such cohesion (referential cohesion or referential cohesion gap) can affect the amount of time spent on reading and the level of difficulty in comprehending a text (O'Brien et al., 1998). Another level is the situation model which refers to the narrative world or the subject matter content in a text involving objects, characters, space, actions, events, processes, and other details. The dimensions Zwaan and Radvansky (1998) considered for the situation model include causation, intentionality or goals, time, space, and protagonists. Reading time and difficulty of a text increases when a break occurs in one or more of these dimensions. Finally, genre defines the category of text and decides whether it is narration, exposition, persuasion, or description, or their related subcategories (Biber, 1991). Different genres involve different levels of difficulty for their comprehension, for example, informational texts are more difficult to comprehend and recall than fiction (Graesser & McNamara, 2011).
Automated text analysis
With recent advancements in technology, text analysis has become automated which has helped to reduce the challenges of text selection and made it more practical for educators. Automated test analysis has been made possible as a result of the synthesis of the advances in disciplines and approaches including corpus linguistics, computational linguistics, psycholinguistics, discourse processing, and information retrieval (Graesser et al., 2004). Besides the importance of automated analyses of language in providing texts at the appropriate level for their students’ learning, language assessment and high-stakes assessment, in particular, can highly benefit from this technology. Several indices that contribute to the ease/difficulty of a reading text can be measured using the latest automated text analysis tools, mostly available freely for researchers, teachers, and test developers.
The software for text readability measures dates back to 1963 when Danielson and Bryan developed a computer program for the readability formula and the Farr-Jenkins-Paterson measure (Danielson & Bryan, 1963). Later, word-processing applications like Microsoft Word™ were created with the possibility of calculating measures such as Flesch-Kincaid. Today, tools such as Coh-Metrix are created that can measure text difficulty by focusing on different aspects of language and discourse. Moreover, natural language processing (NLP) tools such as TAALES and TAALED are currently available and can provide various measures of lexical diversity, syntactic complexity, text cohesion, grammar, and also sentiment.
Despite the importance of this topic in L2 reading, not enough attention has been paid to empirical studies in L2 contexts concerning the relationship between readability indices/formulas and the difficulty level of reading passages. A few studies have used these automated text analysis tools to examine the relationships between different text characteristics and reading comprehension scores (e.g., Crossley et al., 2008; Graesser et al., 2011; Hamada, 2015; Kim et al., 2020; Kim et al., 2018; Paribakht & Webb, 2016; Rupp et al., 2001). For instance, Crossley et al. (2007) examined three variables of the number of words per sentence, CELEX frequency, and argument overlap in 32 cloze reading passages and found that all three factors had significant correlations with the test takers’ scores and also yielded a prediction of reading difficulty. In another study, Crossley et al. (2011) compared the readability formulas of the Coh-Metrix L2 Reading Index to the traditional readability formulas to identify which formula best categorizes the text levels. The results revealed that the Coh-Metrix L2 Reading Index was considerably more effective.
In another study, Nelson et al. (2012) assessed the effects of 7 text difficulty metrics on predicting text difficulty of both narrative and informational passages in five sets of texts. For narrative texts, the metrics with broader ranges of linguistics indices had a more significant relationship with the text difficulty. However, the metrics including the variables of sentence length and word difficulty had higher correlations for informational texts.
Hamada (2015) examined the lexical, syntactic, and meaning construction indices in the Japanese Eiken English graded test. Overall, the results indicated that surface-level linguistic variables such as lexical and syntactic indices better predicted the difficulty of reading comprehension than the higher-level linguistic variables including meaning construction indices.
Also, Choi and Moon (2020) studied the relationships between 26 text features and the test difficulty of high-stakes English as a Foreign Language (EFL) reading and listening tests. Moderate to high correlation was found with the observed difficulty of the test sections. Vocabulary features such as type and token as well as variation features, and syntactic features including the mean of clauses per sentence and readability features showed a significant correlation with the difficulty level. However, the correlation between the pragmatic features and the difficulty level was not significant.
The corpora chosen for the previous studies included simplified news texts (Crossley et al., 2011), Bormuth graded cloze passages (Chall & Dale, 1995; Crossley et al., 2007), standardized EFL tests such as TOEIC (Choi & Moon, 2020), and graded corpora such as the TASA corpus or Japanese Eiken English graded tests (Graesser et al., 2011; Hamada, 2015) among others. In this study, however, we tried to investigate a rather different corpus, the reading comprehension subsection of the English proficiency section which was a part of a larger test for university admission purposes. The test is a high-stakes large-scale university entrance exam for master’s programs which consists of items assessing various subject matters depending on each field. We intended to find out the factors contributing most to the difficulty of the reading section of such an intensive timed test with a negative scoring system.
Despite the importance of such a high-stakes exam, to the authors’ knowledge, no similar studies have been carried out, particularly on the Iranian National University Entrance Exam. Therefore, the main objective of the study is to measure the indices influencing reading difficulty in the reading passages and to investigate which factors best predict the test takers’ scores. From among all the indices, a total number of 14 factors and two readability formulas were selected from the literature.