ARPHA Preprints, doi: 10.3897/arphapreprints.e84305
Enhancing DNA barcode reference libraries by harvesting terrestrial arthropods at the National Museum of Natural History
expand article infoBernardo Santos§, Meredith E. Miller|, Margarita Miklasevskaja|, Jaclyn T.A. McKeown|, Niamh E. Redmond, Jonathan A. Coddington, Jessica Bird, Scott E. Miller, Ashton Smith, Seán G. Brady, Matthew L. Buffington#, M. Lourdes Chamorro#, Torsten Dikow, Michael W. Gates#, Paul Goldstein#, Alexander Konstantinov#, Robert Kula#, Nicholas D. Silverson, M. Alma Solis#, Stephanie L. deWaard|, Suresh Naik|¤, Nadya Nikolova|, Mikko Pentinsaari|, Sean W.J. Prosser|, Jayme E. Sones|, Evgeny V. Zakharov|¤, Jeremy R. deWaard|«
‡ UFES, Brazil§ American Museum of Natural History, New York, United States of America| Centre for Biodiversity Genomics, University of Guelph, Guelph, Canada¶ National Museum of Natural History, Smithsonian Institution, Washington, United States of America# Systematic Entomology Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Washington, United States of America¤ Department of Integrative Biology, University of Guelph, Guelph, Canada« School of Environmental Sciences, University of Guelph, Guelph, Canada
Open Access
Abstract

The use of DNA barcoding has revolutionized biodiversity science, but its application depends on the existence of comprehensive and reliable reference libraries. For many poorly known taxa, such reference sequences are missing even at higher-level taxonomic scales. We harvested the collections of the Smithsonian’s National Museum of Natural History (USNM) to generate DNA barcoding sequences for genera of terrestrial arthropods previously not recorded in one or more major public sequence databases. Our workflow used a mix of Sanger and Next-Generation Sequencing (NGS) approaches to maximize sequence recovery while ensuring affordable cost. In total, COI sequences were obtained for 5,686 specimens belonging to 3,888 genera and 202 families. Success rates varied widely according to collection data and focal taxon. NGS helped recover sequences of specimens that failed a previous run of Sanger sequencing. Success rates and the optimal balance between Sanger and NGS are the most important drivers to maximize output and minimize cost in future projects. The corresponding sequence and taxonomic data can be accessed through the Barcode of Life Data System, GenBank, the Global Biodiversity Information Facility, the Global Genome Biodiversity Network Data Portal and the NMNH data portal.

Keywords
COI, cox1, dark taxa, OTUs, BINs, natural history collection, museum harvesting
login to comment