Exploring the scoring validity of holistic and dimension-based Comparative Judgements of young learners' EFL writing

Sickinger, Beci and Pill, John and Brunfaut, Tineke (2025) Exploring the scoring validity of holistic and dimension-based Comparative Judgements of young learners' EFL writing. Assessing Writing, 66: 100986. ISSN 1075-2935

[thumbnail of Sickinger2025Exploringthescoring]
Text (Sickinger2025Exploringthescoring)
Sickinger2025Exploringthescoring.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB)

Abstract

Comparative Judgement (CJ) is a pairwise comparison evaluation method, typically conducted online. Multiple judges each compare the quality of a series of paired performances and, from their decisions, a rank order is constructed and scores calculated. Research across different educational contexts supports CJ’s reliability for evaluating written performances, permitting more precise scoring of scripts and for dimension-focused evaluation. However, scant insights are available about the basis of judges’ evaluations. This issue is important because argument-based approaches to validation (common in the field of language testing and adopted in this study) require evidence to support claims about how scores are appropriate for test purpose. Therefore, we investigate the scoring validity of CJ, both when used holistically (the standard application of CJ) and when evaluating scripts by individual criteria (termed dimensions in the research context). Twenty-seven judges evaluated 300 scripts addressing two writing task types in a national English as a Foreign Language examination for young learners in Austria. Judges reported via questionnaires what they had focused on while judging. Subsequently, eight judges provided think-aloud data while evaluating 157 scripts, offering further insight into the writing features they considered and their decision-making during CJ. Findings showed that while most judges adapted a decision-making process similar to traditional rating methods, some adapted their method to accommodate the nature of CJ evaluation. Furthermore, results indicated that the judges considered construct-relevant criteria when using CJ, both holistically and by dimension, thus offering support to an argument for the appropriateness of using CJ in this context.

Item Type:
Journal Article
Journal or Publication Title:
Assessing Writing
Uncontrolled Keywords:
Research Output Funding/no_not_funded
Subjects:
?? assessing writingcomparative judgementyoung learnerslanguage testingtesting writingratingno - not fundednoeducationlinguistics and languagelanguage and linguistics ??
ID Code:
233088
Deposited By:
Deposited On:
16 Oct 2025 09:50
Refereed?:
Yes
Published?:
Published
Last Modified:
17 Oct 2025 02:10