Toponym matching through deep neural networks

Santos, Rui and Murrieta-Flores, Patricia and Calado, Pável and Martins, Bruno (2017) Toponym matching through deep neural networks. International Journal of Geographical Information Science, 32 (2). pp. 324-348. ISSN 1365-8816

Preview

PDF (Manusc_Toponym_Matching_Through_Deep_Neural_Networks)
Manusc_Toponym_Matching_Through_Deep_Neural_Networks.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.
Download (4MB)

Abstract

Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods that combine multiple metrics. However, these methods all rely on common sub-strings in order to establish similarity, and they do not effectively capture the character replacements involved in toponym changes due to transliterations or to changes in language and culture over time. In this article, we present a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching. The proposed network architecture uses recurrent nodes to build representations from the sequences of bytes that correspond to the strings that are to be matched. These representations are then combined and passed to feed-forward nodes, finally leading to a classification decision. We present the results of a wide-ranging evaluation on the performance of the proposed method, using a large dataset collected from the GeoNames gazetteer. These results show that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics.

Item Type:

Journal Article

Journal or Publication Title:

International Journal of Geographical Information Science

Additional Information:

This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Geographical Information Systems on 31/10/2017, available online: http://www.tandfonline.com/10.1080/13658816.2017.1390119

Uncontrolled Keywords:

/dk/atira/pure/subjectarea/asjc/1700/1710

Subjects:

?? approximate string matchingdeep neural networksduplicate detectiongeographic information retrievalrecurrent neural networkstoponym matchinginformation systemsgeography, planning and developmentlibrary and information sciences ??

Departments:

Faculty of Arts & Social Sciences > History

ID Code:

89480

Deposited By:

ep_importer_pure

Deposited On:

08 Jan 2018 10:16

Refereed?:

Yes

Published?:

Published

Last Modified:

23 May 2025 00:01

URI:

https://eprints.lancs.ac.uk/id/eprint/89480