Measurement is the knowledge that enables us to quantify things, and measures are the instruments we employ to quantify those traits. Estimation refers to the calculations that produce our measure and ensure the accuracy of our quantification (Mohammad, 2022).
Since its inception, educational measurement has been the cornerstone of all educational practices, and results on assessments are critical in determining the effectiveness of teaching and learning (Bijlsma, 2021; Mohammad, 2022). Despite the fact that tests have been criticized by academics for measuring extremely small samples of behavior, it is still possible that they will remain important measures for a long time to come because it is difficult to obtain other measures of behavior that would resemble testing.
According to Dickson (2020), tests might be diagnostic, teacher-made, or standardized. The teacher uses diagnostic tests to get information about the students' academic progress. By dividing the subjects into units, the teacher approaches this during the learning process. Teacher-made tests (TMTs) can take the form of a comprehensive exam, a weekly test, a unit test, or a term-ending exam. Such tests are unique in that they are created by classroom teachers and measure learning outcomes related to a particular course, but standardized tests (STs) are an overly comprehensive test that is administered at the beginning or end of the educational program (Mohammad, 2021).
In contrast, many studies show that teachers' accountability for developing useful measurement tools has been questioned at all educational levels (Adeyemo, 2021; Quansah, 2019; Singh, 2022). This is true even though many academics have pronounced the importance of TMTs above STs. These teachers' actions lessen the value of the test results since the existence of these teacher-related factors introduces errors into students' scores. Accordingly, academics contend that TMTs must meet expectations for a set of theoretical attributes that are officially referred to as the test's psychometric properties and performances (Cordova, 2018; Mohammad, 2022; Suppiah, 2020).
Psychometric properties are concerned with the validity, reliability, objectivity, fairness, and usability of tests (Cordova, 2018; Mohammad, 2022). Validity and reliability are the core characteristics of the tests when it comes to psychometric attributes (Espinoza, 2021). Content validity, for instance, is the most pertinent one in this circumstance, even though other aspects of validity are significant in teaching and learning since they allow for the evaluation of the success of each syllabus's and course's contents (Mohammad, 2022; Simachew, 2019). Otherwise, students will neglect paying close attention to, studying, and honing their skills in those courses, especially those that focus on language and other skills that do not appear or appear less frequently on exams (Mekonnen, 2017; Simachew, 2019). Similar to content validity, of forms of reliability, internal consistency (IC) is feasible for classroom test context (Mohammad, 2022). It is applied not to one item but to groups of items that are thought to measure different aspects of the same construct.
Regarding test performances, difficulty level, discrimination power, and distractors effectiveness are the basic concerns (Suppiah, 2020). Satisfying these theoretical assumptions is about ensuring the test’s quality and those students are assessed properly, fairly, and objectively (Butakor, 2022).
According to Lemecha's (2020) study, the student evaluations of teaching (grading system) in HEIs in Ethiopia followed the curriculum revisions. Various foreign influences also had an impact on these curriculum adjustments. Let's begin with the British effect on Ethiopian education in the 1950s. The majority of the exam formats at the time were essay formats. The evaluation process included criteria-referenced scoring. The objective system dominated the tests in the 1960s, when the US had a significant influence on Ethiopian education. The norm-referenced assessment approach, popularized in the United States, predominated the evaluation standards up until the fall of the Soviet Union. After that, in particular starting in 2004, the HEIs measures of validity, trustworthiness, fairness, performances, and students’ academic performance problems in Ethiopian history started escalating (Lemecha, 2020; Metages, 2019; Tesfamariam, 2021).
Despite Ethiopian universities' efforts to improve the quality and relevance of higher education to the needs of the market and the nation's development, including the use of standardized curricula and the introduction of criterion-referenced and continuous assessments, the desired impact on students' competence has not been as great as anticipated (Eyob, 2022; Metages, 2019). In this, there are some gaps in focusing solely on the curriculum and mode of assessment while ignoring the assumptions about the quality of the measurements.
The psychometric features and performances of TMTs in measuring students' academic performance in full-fledged courses are not clearly evident in research on Ethiopian scholars (Sewagegn, 2019). For instance, Metages' (2019) study examined how TMTs' content validity affected teachers' and students' perceptions, attitudes, and motivations, as well as the activities and choice of teaching-learning materials in the English language course at Ambo University. This is where there is a gap because TMTs also need to have reliability, a degree of difficulty, and discrimination power in addition to content validity (Marjorie, 2016). Similar to this, Motuma (2019) made an effort to investigate the content validity of EFL TMTs at Ambo University. The overall result of his study revealed that the TMTs' content validity was extremely low (0.211), which indicates adverse consequences of the assessments on the course's teaching-learning process. Here, the gap is comparable to that of Metages' (2019) study, which found that it was challenging to assess the quality level of TMTs solely by looking at their content validity. The same gap is revealed by a similar study on the content validity of English language courses conducted by Simachew (2019) at Debre Work Preparatory School, East Gojjam, in the pre-university previous Ethiopian educational structure. In that study, the author himself forwarded implications for further study, stating that considering other qualities of TMTs like reliability, practicability, other validity types, item difficulty, and discrimination power could be an agenda for other research.
Since higher education is now regarded as a requirement for success in a technologically, economically, and politically advanced world, troubling questions about the measuring tools used to measure students' academic performance in Ethiopian HEIs have prompted researchers to conduct research in this area. To accomplish this, the researchers determined the study context by choosing one university from each of the three categories of Ethiopian public universities: Jimma, Wollega, and Ambo from established, new, and emerging universities, respectively, taking into consideration their representativeness of the other universities because Ethiopian public universities are all almost uniform in their curricula and in their similar directives, policies, rules, or regulations of assessments and exam procedures established by the MOE (Eyob, 2022; Metages, 2019).
The study's evidence for the learning subject (course), English language communicative skill-I, which is offered as a common course for all departments, was purposefully chosen because language is thought to be an important factor in teaching and learning (Le Thi, 2021); in addition, Ethiopian public universities use English as their medium of instruction. In keeping with this, numerous academics have looked into and argued that English continues to be taught in universities even when other languages are employed to teach the curriculum's content (Alghadri, 2019; Ehsan, 2019).
As a result, the researchers were motivated to conduct a baseline survey study on the psychometric properties and performances of English language communicative skill TMTs in measuring undergraduate students' academic performance in Ethiopian public universities. They also intended to conduct an interventional study on the corresponding issue. In order to achieve this, the following research questions (RQs) are created:
1. What are the psychometric properties of TMTs in measuring students’ academic performances?
2. What are the performances of TMTs in measuring students’ academic performances?