Role of expert judgement in language test validation
Downloads
Published
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Abstract
The calibration of test materials generally involves the interaction between empirical anlysis and expert judgement. This paper explores the extent to which scale familiarity might affect expert judgement as a component of test validation in the calibration process. It forms part of a larger study that investigates the alignment of the LanguageCert suite of tests, Common European Framework of Reference (CEFR), the China Standards of English (CSE) and China’s College English Test (CET).
In the larger study, Year 1 students at a prestigious university in China were administered two tests – one with items based on China’s College English Test (CET), and the other a CEFR-aligned test developed by LanguageCert. Comparable sections of the CET and the LTE involved sets of discrete items targeting lexico-grammatical competence.
In order to ascertain whether expert judges were equally comfortable placing test items on either scale (CET or CEFR), a group of professors from the university in China who set the CET-based test, were asked to expert judge the CET items against the nine CSE levels with which they were very familiar. They were then asked to judge the LTE items against the six CEFR levels, with which they were less familiar. Both sets of expert ratings and the test taker responses on both tests were then calibrated within a single frame of reference and located on the LanguageCert scale
In the analysis of the expert ratings, the CSE-familiar raters exhibited higher levels of agreement with the empirically-derived score levels for the CET items than they did with the equivalent LTE items. This supports the proposition that expert judgement may be used in the calibration process where the experts in question have a strong knowledge of both the test material and the standards against which the test material is to be judged.
Keywords: expert judgement, test validation, reading and usage, CEFR, CSE