A New Aligned Simple German Corpus

Toborek, Vanessa and Busch, Moritz and Boßert, Malte and Bauckhage, Christian and Welke, Pascal (2023) A New Aligned Simple German Corpus. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) :. Association for Computational Linguistics (ACL Anthology), Stroudsburg, PA, pp. 11393-11412. ISBN 9781959429722

Full text not available from this repository.

Abstract

“Leichte Sprache”, the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German – German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by the F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.

Item Type:
Contribution in Book/Report/Proceedings
Additional Information:
DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
ID Code:
228766
Deposited By:
Deposited On:
20 May 2025 13:45
Refereed?:
Yes
Published?:
Published
Last Modified:
21 May 2025 01:35