Corresponding author: Wouter Addink ( wouter.addink@naturalis.nl ) © Soulaine Theocharides, Sam Leeflang, Wouter Addink, Sharif Islam. This is an open access preprint distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Citation:
Theocharides S, Leeflang S, Addink W, Islam S (2025) Digital Object Interface Protocol (DOIP) enabled Digital Object repository installation to store and provide digital specimen information. ARPHA Preprints. https://doi.org/10.3897/arphapreprints.e157339 |
Biodiversity research relies on physical specimens stored in natural science collections, which serve as enduring reservoirs of data about organisms and their environments. However, these reservoirs remain siloed. The concept of Digital Specimen addresses the challenges posed by the vast amount of disconnected digital biodiversity data available today. The existing approach involves converting analogue records into digital replicas stored in local databases, leading to isolated and fragmented datasets that are difficult to integrate and utilise efficiently. The Digital Specimen aims to overcome this by establishing an interconnected network of digital objects on the Internet.
Digital Specimens are FAIR Digital Objects (FDOs), structured digital entities that adhere to the FAIR principles: Findable, Accessible, Interoperable, and Reusable. FDOs have the potential to enhance the accessibility and interoperability of data from natural science collections by providing unique identifiers, descriptive metadata, and defined operations. DiSSCo utilises the FDO framework to enhance the accessibility and interoperability of biodiversity research data from natural science collections. FDOs facilitate seamless data exchange by providing structured digital objects with unique identifiers, descriptive metadata, and defined operations. As part of making Digital Specimens FDOs, DiSSCO implemented FDO records, metadata records associated with a Persistent Identifier, which further enable machine actionability.
A Digital Object repository was developed for the purposes of storing and acting upon digital specimens. Three technological pillars compose the repository: a relational database stores the latest version of the digital specimen and is used for retrieving specimens by their identifier; an indexing solution provides full search capabilities on digital specimens; and a document store holds previous versions of a digital specimen for provenance purposes. There are three ways a user may interact with the digital object repository: a REST API; a user-friendly web portal; and a DOIP server.
To ingest data from multiple source systems, a harmonised data model was developed, called OpenDS. Built upon existing international standards like DarwinCore and ABCD, OpenDs accommodates complex structures necessary to capture information about multiple taxonomic identifications, events, agents, and relationships to other data sources. DiSSCo has decided to adapt the GBIF Unified Model (UM) for specimen data, ensuring interoperability and avoiding the development of potentially competing standards. By aligning with the GBIF UM, DiSSCo enhances interoperability with GBIF and promotes the establishment of a unified data modelling standard within the biodiversity community, facilitating seamless data exchange and integration with data aggregators like GBIF.