Statistical modelling of species distributions using presence-only data : A semantic and graphical approach using the tree of life

Escamilla Molgora, Juan Manuel and Atkinson, Peter and Sedda, Luigi and Diggle, Peter (2021) Statistical modelling of species distributions using presence-only data : A semantic and graphical approach using the tree of life. PhD thesis, Lancaster University.

[thumbnail of 2021Escamilla-MolgoraPhD]
Text (2021Escamilla-MolgoraPhD)
2021Escamilla_MolgoraPhD.pdf - Published Version
Available under License Creative Commons Attribution-NonCommercial-ShareAlike.

Download (51MB)


Understanding the mechanisms that determine and differentiate the establishment of organisms in space is an old and fundamental question in ecology. The emergence of life’s spatial patterns is guided by the confluence of three forces: the environmental filtering, which unbalances the probability of establishment for organisms given their evolutionary adaptations to local environmental conditions; the biological interactions, which restrict their establishment according to the presence (or absence) of other organisms; the diversification of organisms’ strategies (traits) to migrate and adapt to changing environments. The main hypothesis in this research is that the accumulated knowledge of biodiversity occurrences, the species taxonomic classification and geospatial environmental data can be integrated into a unified modelling framework to characterise the joint effect of these three forces and, thus, contribute with more general, accurate and statistically sound species distributions models (SDM)s. The first part of this thesis describes the design and implementation of a knowledge engine capable to synthesise and integrate environmental geospatial data, taxonomic relationships and species occurrences. It uses semantic queries to instantiate complex data structures, represented as networks of concepts (knowledge graphs). Local taxonomic trees, distributed over a hierarchical spatial system of regular lattices are used as knowledge graphs to perform data synthesis, geoprocessing, and transformations. The implementation uses efficient call-by-need evaluations that facilitates spatial and scale analysis on large datasets. The second part of the thesis corresponds to the statistical specification and implementation of two modelling frameworks for species distribution models (one for single species and other for multiple species). These models are designed for presence-only observations; obtained from the knowledge engine. The common specification of these models are that presence-only observations are the joint effect of two latent processes: one, that defines the species presence (ecological suitability); and other, that defines the probability of being sampled (sampling effort). The single species framework uses an informative sample, chosen by the modeller, to account for the sampling effort. Three modelling strategies are proposed for accounting the joint effect of the ecological and sampling process (independent processes, a common spatial random effect and correlated processes). The tree models were compared to the maximum entropy model (MaxEnt), a popular algorithm used in SDMs. In all cases, at least one model showed a better predictive performance than MaxEnt. The multi-species modelling framework is a generalisation of the single species framework for developing a joint species distribution model for presence-only data. The specification is a multilevel hierarchical logistic model with a single spatial random effect, common to all species of interest. The sampling effort is modelled as a complementary sample obtained by complementary observations from the taxa of interest using a regional taxonomic tree. The model was tested against simulated data. All simulated parameters were covered by the credible intervals of the posterior sampling. A study case in Easter Mexico was presented as an application of the model. The results obtained in the case study were consistent with the macroecological theory. The model showed to be effective in removing bias and noise given by the sampling effort. This effect was particularly impressive in urban areas, where the sampling intensity is greater. The research presented here provides an interdisciplinary approach for modelling joint species distributions aided by the automated selection of biological, spatial and environmental context.

Item Type:
Thesis (PhD)
Uncontrolled Keywords:
?? species distribution modelsspatial statisticscloud computing environmentsbig data for ecologyknowledge based systemsinformation systemsecological modellingstatistics and probability ??
ID Code:
Deposited By:
Deposited On:
18 Jan 2021 09:55
Last Modified:
23 May 2024 01:28