ARPHA Preprints, doi: 10.3897/arphapreprints.e125974

Retrieving biodiversity data from multiple sources: making secondary data standardized and accessible

Nubia Marques^‡, Carla Danielle de Melo Soares^‡, Daniel de Melo Casali^‡, Erick Cristofore Guimarães^‡, Fernanda Guimarães Fava^‡, João Marcelo da Silva Abreu^‡, Ligiane Martins Moras^‡, Leticia Gomes^‡, Raphael Matias^‡, Rafael Leandro de Assis^‡, Rafael Fraga^‡, Sara Miranda Almeida^‡, Vanessa Guimarães Lopes^‡, Rafaela Missagia^‡, Eduardo Costa Carvalho^‡, Nikolas Jorge Carneiro^‡, Ronnie Alves^‡, Pedro Martins Souza^‡, Guilherme Oliveira^§, Valeria Da Cunha Tavares^‡

‡ Vale Institute of Technology, Belém, Brazil§ Instituto Tecnológico Vale, Vale, Brazil

Corresponding author: Nubia Marques ( nubia.marques@pq.itv.org )

© Nubia Marques, Carla Soares, Daniel Casali, Erick Guimarães, Fernanda Fava, João Abreu, Ligiane Moras, Leticia Gomes, Raphael Matias, Rafael Assis, Rafael Fraga, Sara Almeida, Vanessa Lopes, Rafaela Missagia, Eduardo Carvalho, Nikolas Carneiro, Ronnie Alves, Pedro Souza, Guilherme Oliveira, Valeria Tavares.

This is an open access preprint distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Marques N, Soares CdeM, Casali DM, Guimarães E, Fava F, Abreu JdaS, Moras L, Gomes L, Matias R, Assis Rde, Fraga R, Almeida S, Lopes V, Missagia R, Carvalho E, Carneiro N, Alves R, Souza P, Oliveira G, Tavares VC (2024) Retrieving biodiversity data from multiple sources: making secondary data standardized and accessible. ARPHA Preprints. https://doi.org/10.3897/arphapreprints.e125974

Abstract

Biodiversity data, particularly species occurrence and abundance, are indispensable for testing empirical hypothesis in natural sciences. However, datasets built for research programs do not often meet FAIR (findable, accessible, interoperable, and reusable) principles, which raises questions about data quality, accuracy, and availability. The 21st century has markedly been a new era for data science and analytics, and every effort to aggregate, standardize, filter, and share biodiversity data from multiple sources have become increasingly necessary. In this study, we propose a framework for refining and conform secondary biodiversity data to FAIR standards to make them available for valuable use such as macroecological modeling and other studies. We relied on a Darwin Core base model to standardize and further facilitate the curation and validation of data related including the occurrence and abundance of multiple taxa of a region that encompasses estuarine ecosystems in an ecotonal area bordering the easternmost Amazonia. We further discuss the significance of feeding standardized public data repositories to advance scientific progress and highlight their role in contributing to the biodiversity management and conservation.

Keywords

Darwin Core standard, FAIR data, Golfão Maranhense, secondary data