Corpus Linguistics software : Understanding their usages and delivering two new tools

Rodrigues Gomide, Andressa and Hardie, Andrew (2020) Corpus Linguistics software : Understanding their usages and delivering two new tools. PhD thesis, Lancaster University.

[thumbnail of 2020rodriguesgomidephd]
Text (2020rodriguesgomidephd)
2020rodriguesgomidephd.pdf - Published Version

Download (7MB)


The increasing availability of computers to ordinary users in the last few decades has led to an exponential increase in the use of Corpus Linguistics (CL) methodologies. The people exploring this data come from a variety of backgrounds and, in many cases, are not proficient corpus linguists. Despite the ongoing development of new tools, there is still an immense gap between what CL can offer and what is currently being done by researchers. This study has two outcomes. It (a) identifies the gap between potential and actual uses of CL methods and tools, and (b) enhances the usability of CL software and complement statistical application through the use of data visualization and user-friendly interfaces. The first outcome is achieved through (i) an investigation of how CL methods are reported in academic publications; (ii) a systematic observation of users of CL software as they engage in the routine tasks; and (iii) a review of four well-established pieces of software used for corpus exploration. Based on the findings, two new statistical tools for CL studies with high usability were developed and implemented on to an existing system, CQPweb. The Advanced Dispersion tool allows users to graphically explore how queries are distributed in a corpus, which makes it easier for users to understand the concept of dispersion. The tool also provides accurate dispersion measures. The Parlink Tool was designed having as its primary target audience beginners with interest in translations studies and second language education. The tool’s primary function is to make it easier for users to see possible translations for corpus queries in the parallel concordances, without the need to use external resources, such as translation memories.

Item Type:
Thesis (PhD)
ID Code:
Deposited By:
Deposited On:
02 Dec 2020 10:25
Last Modified:
10 Feb 2024 00:20