Biospytial:spatial graph-based computing for ecological big data

Escamilla Molgora, Juan Manuel and Sedda, Luigi and Atkinson, Peter (2020) Biospytial:spatial graph-based computing for ecological big data. GigaScience, 9 (5). pp. 1-25. ISSN 2047-217X

Full text not available from this repository.

Abstract

Biospytial is a modular open source knowledge engine designed to import, organise, analyse and visualise big spatial ecological datasets using the power of graph theory. Specifically, it handles species occurrences and their taxonomic classification for performing ecological analysis on biodiversity and species distributions. The engine uses a hybrid graph-relational approach to store and access information. The data are linked with relationships that are stored in a graph database, while tabular and geospatial (vector and raster) data are stored in a relational database management system (RDBMS). The graph data structure provides a scalable design that eases the problem of merging datasets from different sources. The linkage relationships use semantic structures (objects and predicates) to answer scientific questions represented as complex data structures stored in the graph database. In this sense, we used species occurrences, taxonomic classification, and climatic datasets to build a knowledge graph of the Tree of Life embedded in an environmental and geographical grid. Biospytial comprises three interconnected components: i) a Geospatial Processing unit (GPU) supported by a RDBMS with geoprocessing capabilities, ii) a Graph Storage and Querying Unit, and iii) a graph-relational package, called: The Biospytial Computing Engine (BCE) that integrates all the system’s components. It also includes tools like: interactive notebooks (Jupyter), graph analytic libraries (NetworkX) and statistical frameworks (PyMC3). The Biospytial approach reduces the complexity of joining datasets using multiple primary-foreign key relations, a drawback in RDBMS. Applied to ecological data, it allows the discovery and inference of relationships using the interconnected network of taxonomic and spatial relationships. Its modular and scalable design makes it possible to run and distribute several instances simultaneously, allowing fast and efficient handling of big and complex ecological datasets. An example applied to the conservation of threatened species from the IUCN Red List using the co-occurrence of jaguars (Panthera onca) is included. This example demonstrates the engine’s capabilities in performing basic taxonomic trees manipulation, analysis and visualization of taxonomic groups co-occurring in space.

Item Type:
Journal Article
Journal or Publication Title:
GigaScience
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1706
Subjects:
ID Code:
142894
Deposited By:
Deposited On:
12 May 2020 08:10
Refereed?:
Yes
Published?:
Published
Last Modified:
06 Jul 2020 01:26