Our knowledge of the world is a collective phenomenon gathered by pieces of information provided by exploring the existing patterns or purposeful manipulation of the conditions to reach a conclusive understanding. Every piece of information works like a brick necessary to build the ivory tower of science. It is, thus, essential, to make sure of the quality of every piece. Study quality, according to Plonsky’s (2011, p. 5) is defined as “adherence to standards empirical rigor, appropriateness, and transparency in study design, analysis, and reporting practices”, which in turn provides the necessary means to evaluate and rely up the previous studies as a building block to step on and venture into the exploration of the next unknowns.
According to Plonsky (2013), the quality in methodological meta-synthesis is the combination of (1) respect to standards of contextually appropriate, methodological rigor in research practices and (2) transparent and full reporting of these practices. Study quality studies are usually in the form of meta-syntheses aiming to a) describe the practices and identify the methodological culture; b) describe an evaluate the results for the purpose of improving future research; c) examining the relationship among the facets of the research; and d) inspection of changes over time (Plonsky & Gonulal, 2015). The present study both describes and evaluate the adherence to the facets of quality and the changes in these facets over time.
The previous meta-analyses (e.g., Hu & Plonsky, 2019; Khany & Tazik, 2019; Larson-Hall & Plonsky, 2015; Larson-Hall, 2017; Norris, et al., 2015; Plonsky, 2011, 2013,2014a, 2014b; Plonsky & Gonulal, 2015) have addressed different statistical and methodological issue in the field, including examination of quality in (a) study design, (b) instrumentation, (c) statistical analyses, and (d) reporting practices. While most of the previous studies focused on one aspect of quality, a set of comprehensive results were reported by Plonsky (2013, 2014a, 2014b). However, his study was conducted on papers published from 1990 to 2010. Moreover, the previous studies, except for Khany and Tazik (2019) used a limited number of journals as the sources for the evaluated papers. None of these studies also examined quality in local journals. In this study, we have examined experimental papers published in 10 Iranian journals from their beginnings (most of them started from around 2010). We have also integrated the categories of assumption checking used by Hu and Plonsky (2019), types of tests explored by Khany and Tazik (2019), type and purpose of visual presentation in Larson-Hall (2017), and data sharing as emphasized by APA’s Journal Article Reporting Standards (2018) with the quality protocols used by Gass and Plonsky (2011) and Pagout and Plonsky (2017) to reach a comprehensive results.
Finally, papers published in Iranian journals were examined as a token of research practices in an EFL context. The previous works on study quality were done almost exclusively on the papers published in high-ranked international journals (e.g., The Modern Language Journal, Language Learning, and Studies in Second Language Acquisition in Larson-Hall, 2017; Language Learning and Second Language Research in Hu and Plonsky, 2019; Language Learning and Studies in Second Language Acquisition in Plonsky, 2013). This study aims to evaluate study quality in locally-published journals to depict how they are catching up in adhering to the standards of quality that has been emphasized.
Review Of The Literature
The terms research synthesis, research review, systematic review, and meta-synthesis, according to Cooper and Hedges (2009), have been used interchangeably in the literature. Such studies, although seemingly involved with confusing analyses and intimidating appearance, evaluate them and the difference of a group of numbers (Rosenthal & DiMatteo, 2001). Methodological synthesis, on the other hand, “seeks not only to describe but to evaluate and comment on the field’s practices with the intention to improve future research as well” (Plonsky & Gonulal, 2015, p. 12).
As a focus of meta-synthesis, study quality refers to the adherence to standards to practice a “contextually appropriate, methodological rigor in research” combined with “a transparent and complete reporting of such practices” (Plonsky, 2013. P.657). As Plonsky (2011) asserts, there are numerous factors, depending on the context and focus of any primary study that might be influencing each individual study. However, assigning weight to each of these factors seems an impossible task. That is why the methodological meta-synthesis seems an appropriate mean to evaluate these influencing factors.
The emphasis on study quality issues has been accelerated by the works of Plonsky and his colleagues in the last decade (e.g., Gass, Loewen, & Plonsky, 2020; Hu & Plonsky, 2019; Plonsky, 2013, 2014a, 2014b; Plonsky, Egbert, & Laflair, 2015; Plonsky & Gass, 2011; Plonsky & Gonulal, 2015; Norouzian & Plonsky, 2018). Other scholars (e.g., Hudson & Lisoa, 2015; Kany & Tazik, 2019; Larson-Hall, 2012, 2017; Norris, 2015) added essential information and guidelines with regards to the adherence to quality facets. In what follows, we will summarize the findings with respect to the five features, i.e., sampling, design, statistical tests, reporting practices and data sharing, and visual presentation of data, which are the focus of this study.
With regards to the sampling, the findings of previous studies (Plonsky, 2013, 2014b; Plonsky & Gass, 2011) showed the possibility of lack of required power for yielding significant results (Type I error) in a large proportion of L2 studies. indicate that it is possible that large amounts of L2 research frequently lack the required power to yield statistically significant results. Moreover, the meta-synthesis reports of Plonsky (2013) and Plonsky (2014b) showed the rarity of power analysis (about 1%) in L2 papers. Similar results were also reported by other studies (2% in Plonsky & Gass, 2011; 7% in Ziegler, 2013). This is followed by the commonality of convenience sampling in these papers, which in turn results in limited generalizability of these studies. Reported results (Norris & Ortega, 2000; Plonsky, 2014b) indicated that the majority of the participants in L2 research are young adult university students who live in the USA, west Europe, or East Asia whose first or second language is English. Therefore, no matter how sufficient the sample is selected or how large the effect size is, there is no guarantee that the results may be generalizable for a large number of other contexts (Ortega, 2005, 2009).
With respect to design issues, several issues have been pointed out as shortcomings of L2 research. Chaudron (2001), for example, note the low reliability, poor design, and regularity of using intact groups in research. Other studies (e.g., Plonsky, 2013; Plonsky & Gass, 2011) reported that a small portion of classroom-oriented experimental researches was conducted in a classroom environment. While studies (Gass, 2009; Plonsky, 2014a) have shown an increase in relying on quantitative data, some features, such as random selection/assignment are relatively concerning. Plonsky (2013) reports that, in his sample of twenty years, only 47% of the studies used random assignments (37% individual assignment and 10% group assignment) and 38% of them used delayed posttests. However, the use of control group, pretesting, and delayed posttesting has been increased over time (Plonsky, 2014a).
Concerning the statistical tests, the first issue is related to the power analysis addressed above, which can directly affect the results of statistical tests. Next is the use of multiple statistical tests on the same data which causes the change in alpha level, which is regularly ignored in studies of social science (Wilkinson, 1999). Plonsky (2013) reports that 60% of the papers in his study used multiple tests. Khany and Tazik (2019) also reported that 78.77% of applied linguistics papers use basic statistical tests (e.g., descriptive, chi-square, t-tests, and one-way ANOVA). Moreover, he reported that the assumptions of running these tests were only checked in 17% of the cases. Similarly, Hu and Plonsky (2019) reported that 17% followed stringent standards (reporting all required assumptions) and 24% of the quantitative studies in their sample followed lenient standards (meeting one or more of the assumptions). Finally, the over-reliance on null hypothesis significant testing (NHST) and the dichotomous interpretation of the p-value. The use of robust statistics (e.g., Larson-Hall, 2012) and new statistics, i.e., effect size and confidence intervals, (e.g., Cumming, 2012, Norris, 2015) were recommended, as a result. However, Plonsky’s (2013) reports show that studies of SLA research included 35 p values on average, while in 26% of the cases effect size was reported and, shockingly, only 5% of the studies reported confidence intervals.
The next facet concerns the reporting practices and data sharing. Larson-Hall and Plonsky (2015, p. 131) refer to the issue of not reporting the descriptive statistics as "a practice that harms our field as a whole" since it prevents secondary level analysis (meta-analysis). Plonsky’s (2013) results show that the most frequently reported descriptive statistics was the sample size (reported in 99% of the articles), followed by means (77%). However, in 17% of the cases, the mean was reported without standard deviation, leaving only 60% of papers meeting the fundamental criteria for running meta-analyses. The next issue is the pre-determination of alpha, which needs prior power analysis. As reported above, the rarity of power analysis exists in L2 papers. The prior level of alpha, which according to Plonsky (2013) was done only in 22% of the cases. Similar results (16–26%) were obtained by Plonsky (2014a). The next concern in missing reports addresses the omission of non-statistical results. Plonsky’s (2013) meta-analysis showed that p-value was not reported in 13% of his studies. Besides, the exact value of p was only reported in 49% of the sample. Authors (in press) also reported that Iranian authors considered issues like reporting reliability, validity, and inferential statistics as the ones highly associated with study quality. The final issue is data sharing. APA Ethical Standard 8.14 “stipulates that psychologists do not withhold their data from other competent professionals who seek to verify substantive claims” (Breckler, 2009, para. 7). Plonsky (2011) reports that only about one-third of his request, from the study authors, of the descriptive statistics, resulted in the provision of them. Other studies also reported a small proportion of successful retrieval of raw data. For example, only 14% replied to the data request of Plonsky et al. (2015).
The final concern of the quality in the practice of L2 research is the use of visual presentations. The use of graphics is argued as a necessary means to understand and convey the findings of the research (Larson-Hall & Plonsky, 2015). Despite the importance of graphics, the use of them in L2 papers is concerning. Norris and Ortega (2000), for example, reported graphic presentation did not appear in 46% of the papers they studied. Plonsky (2013) reported that about two-thirds (66%) of the studies he surveyed did not use visual displays Similarly, Larson-Hall (2017) found a fairly low percentage of graphical presentation in three well-known L2 journals, i.e., 24% in The Modern Language Journal, 34% in Language Learning, and 48% in Studies in Second Language Acquisition. she also reported, among the papers which used graphic presentation of data, a large proportion (70 to 79%) used either line graphs or bar plots.
Having reviewed the existing challenges in the literature, the present study aims to both describe and evaluate L2 papers published in Iran against the above-mentioned concerns and also identify the existing changes over time. Accordingly, the following research questions are posed:
-
How is study quality adhered to in Iranian L2 papers? What are the most-adhered and most-challenging areas?
-
What quality aspects have changed over time in L2 papers published in Iranian journals?