Replicating the validation of the CEFR illustrative descriptors

Jones, Glyn (2023) Replicating the validation of the CEFR illustrative descriptors. PhD thesis, UNSPECIFIED.

[thumbnail of 2023jonesphd]
Text (2023jonesphd)
2023jonesphd.pdf - Published Version
Restricted to Repository staff only until 31 October 2028.

Download (4MB)

Abstract

The Common European Framework of Reference for Languages (CEFR) has become highly influential in the spheres of second language learning, teaching and assessment since its publication in 2001. However, despite its significance and reach, to date the research undertaken by Brian North and Günther Schneider in 1994-5 (North, 1996, 2000; Schneider & North, 2000), which culminated in the calibration of many of the illustrative descriptors of the CEFR, has not been replicated. Over twenty years on, replication is vital to check the reliability of the initial calibrations, and to address some of the limitations observed in the original study. This study represents a partial replication of North’s original research, with some key innovations and extensions. First, 492 teachers were recruited from over 40 countries to take part in an online rating exercise modelled on North’s (1996, 2000) methodology. Each teacher was asked to rate two of their learners using a checklist of 40 descriptors randomly allocated for each participant from a pool of 368. In addition, a subset of teachers submitted samples of written work produced by the same two learners they envisaged. These samples (N = 90) were distributed to other participating teachers, who were asked to rate them, again using descriptors. This allowed for an overlapping design whereby a substantial subset of learners were assessed by more than one teacher. Teachers also completed a questionnaire to gather evidence of their gender, first language, location, teaching experience, institutional setting and familiarity with the CEFR. Teachers’ ratings were analysed with many-facet Rasch measurement (FACETS) to obtain ability measures for the learners, severity measures for the teachers as judges, and difficulty measures for the descriptors. Data were initially cleaned to remove extreme cases and misfitting judges. Two series of analyses were conducted: one in which difficulty measures were not constrained, and one in which certain descriptors were anchored to North’s original measures. Findings revealed that, when analysed independently, the descriptors were placed on a plausible linear scale. When anchored to North and Schneider’s original values, the majority of the descriptors were placed at the same level as they are in the published CEFR, or at an adjacent level. The correlation between the measures obtained for descriptors and North’s original values was found to be r=.86. Descriptors were included that had not been calibrated in the original study, including many of the descriptors for Writing that are contained in the CEFR. In this way, empirically derived difficulty measures were obtained for these descriptors for the first time. The majority of these descriptors were found to have difficulty measures that are consistent with their placement in the CEFR scales, with some exceptions, where the measures obtained suggest that the relevant descriptors should be placed at a different level from their current position. These cases are discussed. Implications are drawn for current interpretations of the CEFR and for the importance of replication in language testing research more generally.

Item Type:
Thesis (PhD)
Uncontrolled Keywords:
Research Output Funding/no_not_funded
Subjects:
?? NO - NOT FUNDED ??
ID Code:
207097
Deposited By:
Deposited On:
16 Oct 2023 11:35
Refereed?:
No
Published?:
Unpublished
Last Modified:
16 Oct 2023 11:35