Bottini, Raffaella (2024) Corpus methods for linguistic analysis. In: Research methods for applied linguistics : a practical guide. Edinburgh University Press, Edinburgh. (In Press)
Full text not available from this repository.Abstract
This chapter aims to introduce corpus linguistics as a scientific method for both the theoretical and applied analysis of written and spoken language data. It first provides key definitions in corpus linguistics (e.g., corpus, token, type, annotation, metadata) and introduces different types of corpora, describing the core features (e.g., representativeness, balance, sampling) to consider when building/selecting a corpus. In addition to well-known general corpora like the British National Corpus (which represents British English used in a wide range of communicative contexts), the chapter also describes specialized corpora that contain, for example, transcripts of second language speech, multilingual data, register-specific or topic-specific texts. It then provides an overview of key techniques in corpus linguistics (e.g., frequency counts, concordances, collocations) which allow both quantitative and qualitative analysis of linguistic data. These techniques demonstrate the wide range of applications of corpus methods, for instance, to inform research in stylistics, forensic linguistics, language acquisition, and discourse analysis. The chapter includes a case study that analyses second language production to illustrate corpus techniques, freely available tools, and how to report corpus findings. Challenges in the application of corpus methods to linguistic analysis are identified (e.g., selecting/building a representative corpus, processing and annotating data, ethical issues and copyright) and possible solutions are outlined.