lexiDB : a scalable corpus database management system

Coole, Matt and Rayson, Paul Edward and Mariani, John Amedeo (2016) lexiDB : a scalable corpus database management system. In: 2016 IEEE International Conference on Big Data (Big Data) :. IEEE, pp. 3880-3884. ISBN 9781467390064

[thumbnail of lexidb-scalable-corpus]
Preview
PDF (lexidb-scalable-corpus)
lexidb_scalable_corpus.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.

Download (150kB)

Abstract

lexiDB is a scalable corpus database management system designed to fulfill corpus linguistics retrieval queries on multi-billion-word multiply-annotated corpora. It is based on a distributed architecture that allows the system to scale out to support ever larger text collections. This paper presents an overview of the architecture behind lexiDB as well as a demonstration of its functionality. We present lexiDB's performance metrics based on the AWS (Amazon Web Services) infrastructure with two part-of-speech and semantically tagged billion word corpora: Historical Hansard and EEBO (Early English Books Online).

Item Type:
Contribution in Book/Report/Proceedings
Additional Information:
©2016 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
ID Code:
83882
Deposited By:
Deposited On:
09 Jan 2017 11:56
Refereed?:
Yes
Published?:
Published
Last Modified:
20 Sep 2024 23:53