Corpus methods for linguistic analysis

Bottini, Raffaella (2026) Corpus methods for linguistic analysis. In: Research methods for applied linguistics : a practical guide. Edinburgh University Press, Edinburgh. ISBN 9781399528986 (In Press)

Full text not available from this repository.

Abstract

This chapter aims to introduce corpus linguistics as a scientific method for both the theoretical and applied analysis of written and spoken language data. It first provides key definitions in corpus linguistics (e.g., corpus, token, type, annotation, metadata) and introduces different types of corpora, describing the core features (e.g., representativeness, balance, sampling) to consider when building/selecting a corpus. In addition to well-known general corpora like the British National Corpus (which represents British English used in a wide range of communicative contexts), the chapter also describes specialized corpora that contain, for example, transcripts of second language speech, multilingual data, register-specific or topic-specific texts. It then provides an overview of key techniques in corpus linguistics (e.g., frequency counts, concordances, collocations) which allow both quantitative and qualitative analysis of linguistic data. These techniques demonstrate the wide range of applications of corpus methods, for instance, to inform research in stylistics, forensic linguistics, language acquisition, and discourse analysis. The chapter includes a case study that analyses second language production to illustrate corpus techniques, freely available tools, and how to report corpus findings. Challenges in the application of corpus methods to linguistic analysis are identified (e.g., selecting/building a representative corpus, processing and annotating data, ethical issues and copyright) and possible solutions are outlined.

Item Type:

Contribution in Book/Report/Proceedings

Uncontrolled Keywords:

Research Output Funding/no_not_funded

Subjects:

?? no - not funded ??

Departments:

Faculty of Arts & Social Sciences > Linguistics & English Language

ID Code:

226304

Deposited By:

ep_importer_pure

Deposited On:

11 Dec 2024 09:30

Refereed?:

Yes

Published?:

In Press

Last Modified:

07 Jan 2026 13:55

URI:

https://eprints.lancs.ac.uk/id/eprint/226304

Altmetric