Carter, Alice R. and Gatherer, Derek (2016) Tempus et Locus: a tool for extracting precisely dated viral sequences from GenBank, and its application to the phylogenetics of primate erythroparvovirus 1 (B19V). Biorxiv, 2016.
061697.full.pdf - Submitted Version
Available under License Creative Commons Attribution.
Download (553kB)
Abstract
The presence of data in the collection_date field of a GenBank sequence record is of great assistance in the use of that sequence for Bayesian phylogenetics using tip-dating. We present Tempus et Locus (TeL), a tool for extracting such sequences from a GenBank-formatted sequence database. TeL shows that 60% of viral sequences in GenBank have collection date fields, but that this varies considerably between species. Primate erythroparvovirus 1 (human parvovirus B19 or B19V) has only 40% of its sequences dated, of which only 112 are of more than 4 kb. 100 of these are from B19V sub-genotype 1a and were collected from a mere 6 studies conducted in 5 countries between 2002 and 2013. Nevertheless, Bayesian phylogenetic analysis of this limited set gives a date for the common ancestor of sub-genotype 1a in 1990 (95% HPD 1981-1996) which is in reasonable agreement with estimates of previous studies where collection dates have been assembled by more laborious methods of literature search and direct enquiries to sequence submitters. We conclude that although collection dates should become standard for all future GenBank submissions of virus sequences, accurate dating of ancestors is possible with even a small number of sequences if sampling information is high quality.