Sickinger, Beci and Brunfaut, Tineke and Pill, John (2025) An Exploration of Comparative Judgement for Evaluating Writing Performances of the Austrian Year 8 Test for English as a Foreign Language. PhD thesis, Lancaster University.
2023SickingerPhD.pdf - Published Version
Restricted to Repository staff only until 28 November 2025.
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.
Download (5MB)
Abstract
Comparative judgement (CJ) is an evaluation method whereby a rank order is constructed from judges’ pairwise comparisons of performances. CJ has been shown to be reliable and practical in various contexts, but it is currently under-researched and under-utilised in second language (L2) language testing and as a method to evaluate performance dimensions/criteria independently. The present thesis investigated the use of CJ for the evaluation of lower-secondary school English as a Foreign Language (EFL) written performances from a national test in Austria. The study used a mixed-methods research design and consisted of two strands. In Strand 1, 27 participants (Austrian EFL educators) evaluated 300 EFL scripts using CJ: once holistically (judging all aspects of one script against all aspects of a second script) and once by each of a set of dimensions/criteria (judging the features of one performance dimension/criterion for one script against the equivalent features in a second script). Additionally, the participants rated the scripts using an analytic rating scale (the conventional rating method). CJ was found to be a reliable method of evaluating EFL scripts in all judgement sessions (scale separation reliability, the CJ measure taken over from Rasch modelling, ≥ .89). Experienced teachers with experience of evaluating similar scripts were reliable judges (infit values ≤ 1.5) and more reliable when using CJ than when rating. Participants reported considering a broader range of writing features when rating than when using CJ. In Strand 2, think-aloud protocols were collected from eight participants while they were judging scripts with CJ. Findings indicated two approaches to the CJ decision-making process for EFL lower-secondary scripts—one reflecting a more traditional rating approach and the other a quicker, reliable approach tailored to CJ. Overall, this thesis suggests that CJ can be a reliable and time-efficient evaluation method for EFL writing when used by trained raters and/or teachers who have experience of evaluating written performances in the classroom.