Linking DNA Metabarcoding and Text Mining to Create Network-Based Biomonitoring Tools:A Case Study on Boreal Wetland Macroinvertebrate Communities

Compson, Zacchaeus G. and Monk, Wendy A. and Curry, Colin J. and Gravel, Dominique and Bush, Alex and Baker, Christopher J.O. and Al Manir, Mohammad Sadnan and Riazanov, Alexandre and Hajibabaei, Mehrdad and Shokralla, Shadi and Gibson, Joel F. and Stefani, Sonja and Wright, Michael T.G. and Baird, Donald J. (2018) Linking DNA Metabarcoding and Text Mining to Create Network-Based Biomonitoring Tools:A Case Study on Boreal Wetland Macroinvertebrate Communities. In: Advances in Ecological Research. Advances in Ecological Research . Elsevier, pp. 33-74. ISBN 9780128143179

Full text not available from this repository.


Ecological networks are powerful tools for visualizing biodiversity data and assessing ecosystem health and function. Constructing these networks requires considerable empirical efforts, and this remains highly challenging due to sampling limitations and the laborious and notoriously limited, error-prone process of traditional taxonomic identification. Recent advancements in high-throughput gene sequencing and high-performance computing provide new ways to address these challenges. DNA metabarcoding, a method of bulk taxonomic identification from DNA extracted from environmental samples, can generate detailed biodiversity information through a standardizable analytical pipeline for species detection. When this biodiversity information is annotated with prior knowledge on taxon interactions, body size, and trophic position, it is possible to generate trait-based networks, which we call “heuristic food webs”. Although curating trait matrices for constructing heuristic food webs is a laborious, often intractable process using manual literature surveys, it can be greatly accelerated via text mining, allowing knowledge of relevant traits to be gathered across large databases. To explore this possibility, we employed a General Architecture for Text Engineering (GATE) system to create a hybrid text-mining pipeline combining rule-based and machine-learning modules. This pipeline was then used to query online repositories of published papers for missing data on a key trait, body size, that could not be gathered from existing trophic link libraries of freshwater benthic macroinvertebrates. Combining text-mined body size information with feeding information from existing sources allowed us to generate a database of over 20,000 pairwise trophic interactions. Next, we developed a pipeline that uses taxa lists generated from DNA metabarcoding and annotates this matrix with trophic information from existing databases and text-mined body size data. In this way, we generated heuristic food webs for wetland sites within a large delta complex formed by the confluence of the Peace and Athabasca rivers in northern Alberta: the Peace–Athabasca delta. Finally, we used these putative food webs and their network properties to resolve spatial and temporal differences between the benthic subwebs of wetlands in the Peace and Athabasca sectors of the delta complex. Specifically, we asked two questions. (1) How do food web properties (e.g. number of links, linkage density, trophic height) differ between the wetlands of the Peace and Athabasca deltas? (2) How do food web properties change temporally in wetlands of the two deltas? We discuss using DNA-generated, trait-based food webs as a powerful tool for rapid bioassessment, assess the limitations of our current approach, and outline a path forward to make this powerful tool more widely available for land managers and conservation biologists.

Item Type:
Contribution in Book/Report/Proceedings
Uncontrolled Keywords:
ID Code:
Deposited By:
Deposited On:
31 Mar 2020 16:15
Last Modified:
20 Sep 2023 02:29