lexiDB:a scalable corpus database management system

Coole, Matt and Rayson, Paul Edward and Mariani, John Amedeo (2016) lexiDB:a scalable corpus database management system. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE, pp. 3880-3884. ISBN 9781467390064

[thumbnail of lexidb-scalable-corpus]
PDF (lexidb-scalable-corpus)
lexidb_scalable_corpus.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.

Download (150kB)


lexiDB is a scalable corpus database management system designed to fulfill corpus linguistics retrieval queries on multi-billion-word multiply-annotated corpora. It is based on a distributed architecture that allows the system to scale out to support ever larger text collections. This paper presents an overview of the architecture behind lexiDB as well as a demonstration of its functionality. We present lexiDB's performance metrics based on the AWS (Amazon Web Services) infrastructure with two part-of-speech and semantically tagged billion word corpora: Historical Hansard and EEBO (Early English Books Online).

Item Type:
Contribution in Book/Report/Proceedings
Additional Information:
©2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
ID Code:
Deposited By:
Deposited On:
09 Jan 2017 11:56
Last Modified:
18 Oct 2023 00:16