ARPHA Preprints, doi: 10.3897/arphapreprints.e114920
Milestone MS32 The design and prototype of a workflow integrating Wikidata into validation and linking
expand article infoMathias Dillen, Andreas Plank§
‡ Meise Botanic Garden, Meise, Belgium§ Botanical Garden and Botanical Museum, Berlin, Germany
Open Access
Abstract

In this task, the aim is to develop a workflow that should facilitate the linking process of collector name strings to PIDs for those collectors. Such a workflow should help scale up the number of links being made, make the process more efficient and should take advantage as much as possible of existing work and infrastructures, so as not to reinvent the wheel. As such, the work can be roughly split into a few subtasks:
- Make existing linking workflows more easily implementable in other contexts and by other infrastructures. This includes finding ways for such workflows to produce links that can easily be published, i.e. in a standardised format compatible with existing infrastructure. The suitability of different infrastructures for making established links available should also be assessed.
- Establish, document and improve the comprehensiveness, findability and interoperability of the content in PID-minting resources, in particular Wikidata as it can be edited openly.
- Refine the decision making process of establishing links, by implementing and improving the methods that can be used to validate potential links.

In this document, the focus lies on linking people. We will propose a workflow to 'roundtrip' links established through the Bionomia platform back to the collections holding the attributed specimens, as well as making them available for use by other BiCIKL infrastructures. We will also refine existing automated linking workflows and pilot the new functionalities on the (botanical) collections of the task partners. These refinements will be influenced by an assessment of the current state of Wikidata, investigated through shape expressions constructed from commonly used queries and from Wikidata records which have been linked in previous efforts such as the Botany Pilot, Bionomia and published specimen data to GBIF.

Keywords
Semantic enrichment, PIDs, matching, roundtripping, Bionomia, GBIF, Botany Pilot