Developing geographically oriented NLP approaches to sixteenth–century historical documents : digging into early colonial Mexico

Jiménez Badillo, Diego and Murrieta-Flores, Patricia and Martins, Bruno and Gregory, Ian and Favila-Vázquez, Mariana and Liceras-Garrido, Raquel (2020) Developing geographically oriented NLP approaches to sixteenth–century historical documents : digging into early colonial Mexico. Digital Humanities Quarterly, 14 (4). ISSN 1938-4122

[thumbnail of Jimenez_Badillo_et_al_2020_NLP_approaches]
Text (Jimenez_Badillo_et_al_2020_NLP_approaches)
Jimenez_Badillo_et_al_2020_NLP_approaches.pdf - Accepted Version
Available under License Creative Commons Attribution-NoDerivs.

Download (33MB)

Abstract

This article introduces an ongoing Digital Humanities project aimed at leveraging the benefits of Natural Language Processing, Corpus Linguistics, Machine Learning, and Spatial Analysis for advancing the computational analysis of vast historical corpora. As a case study, the project concentrates on the Relaciones Geográficas de la Nueva España (1577–1585), one of the key corpora for understanding the early colonial period of Mexico. Using a computer–assisted methodology called Geographical Text Analysis (GTA), the project offers automatic means for parsing historical texts and the markup of words referring both to place names (toponyms) and analytical concepts that are then linked to their geographic locations. Adding geospatial intelligence to the parsing of texts allows exploring hidden geographies and narratives in the historic corpus. The article provides a general overview of the corpus, describes the GTA methodology step by step, and reports on the progress achieved so far.

Item Type:
Journal Article
Journal or Publication Title:
Digital Humanities Quarterly
Subjects:
?? digital humanitiesnatural language processingmachine learningsixteenth-centurytext analysismexicospatial humanitiesgazetteer ??
ID Code:
152123
Deposited By:
Deposited On:
25 Feb 2021 14:30
Refereed?:
Yes
Published?:
Published
Last Modified:
12 Jan 2024 00:21