The ParlaMint corpora of parliamentary proceedings

Erjavec, Tomaž and Ogrodniczuk, Maciej and Osenova, Petya and Ljubešić, Nikola and Simov, Kiril and Pančur, Andrej and Rudolf, Michał and Kopp, Matyáš and Barkarson, Starkaður and Steingrímsson, Steinþór and Çöltekin, Çağrı and de Does, Jesse and Depuydt, Katrien and Agnoloni, Tommaso and Venturi, Giulia and Pérez, María Calzada and de Macedo, Luciana D. and Navarretta, Costanza and Luxardo, Giancarlo and Coole, Matthew and Rayson, Paul and Morkevičius, Vaidas and Krilavičius, Tomas and Darǵis, Roberts and Ring, Orsolya and van Heusden, Ruben and Marx, Maarten and Fišer, Darja (2022) The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation. ISSN 1574-0218

[img]
Text (s10579-021-09574-0)
s10579_021_09574_0.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB)

Abstract

This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project’s GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.

Item Type:
Journal Article
Journal or Publication Title:
Language Resources and Evaluation
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/3300/3309
Subjects:
ID Code:
165473
Deposited By:
Deposited On:
04 Feb 2022 14:15
Refereed?:
Yes
Published?:
Published
Last Modified:
24 May 2022 00:38