ARPHA Preprints, doi: 10.3897/arphapreprints.e121727
Computable Species Descriptions and Nanopublications: Applying Ontology-based Technologies to Dung Beetles (Coleoptera: Scarabaeinae)
expand article infoGiulio Montanaro, James P. Balhoff§, Jennifer C. Girón|, Max Söderholm, Sergei Tarasov
‡ Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland§ RENCI, University of North Carolina, Chapel Hill, North Carolina, United States of America| Museum of Texas Tech University, Texas, United States of America
Open Access
Abstract

Taxonomy has long struggled with analyzing vast amounts of phenotypic data due to computational and accessibility challenges. Ontology-based technologies provide a framework for modeling semantic phenotypes that are understandable by computers and compliant with FAIR principles. In this paper, we explore the use of Phenoscript, an emerging language designed for creating semantic phenotypes, to produce computable species descriptions. Our case study centers on the application of this approach to dung beetles (Coleoptera: Scarabaeinae).

We illustrate the effectiveness of Phenoscript for creating semantic phenotypes. We also demonstrate the ability of the Phenospy python package to automatically translate Phenoscript descriptions into natural language (NL), which eliminates the need for writing traditional NL descriptions. We introduce a computational pipeline that streamlines the generation of semantic descriptions and their conversion to NL. To demonstrate the power of the semantic approach, we apply simple semantic queries to the generated phenotypic descriptions. This paper addresses the current challenges in crafting semantic species descriptions and outlines the path towards future improvements. Furthermore, we discuss the promising integration of semantic phenotypes and nanopublications, as emerging methods for sharing scientific information. Overall, our study highlights the pivotal role of ontology-based technologies in modernizing taxonomy and aligning it with the evolving landscape of big data analysis and FAIR principles.

Keywords
Phenoscript, taxonomy, semantic data, phenotypic traits, characters, morphology, Grebennikovius, microCT