Coole, Matt and Rayson, Paul Edward and Mariani, John Amedeo (2016) lexiDB : a scalable corpus database management system. In: 2016 IEEE International Conference on Big Data (Big Data) :. IEEE, pp. 3880-3884. ISBN 9781467390064
lexidb_scalable_corpus.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.
Download (150kB)
Abstract
lexiDB is a scalable corpus database management system designed to fulfill corpus linguistics retrieval queries on multi-billion-word multiply-annotated corpora. It is based on a distributed architecture that allows the system to scale out to support ever larger text collections. This paper presents an overview of the architecture behind lexiDB as well as a demonstration of its functionality. We present lexiDB's performance metrics based on the AWS (Amazon Web Services) infrastructure with two part-of-speech and semantically tagged billion word corpora: Historical Hansard and EEBO (Early English Books Online).