ARPHA Preprints, doi: 10.3897/arphapreprints.e160486
Extraction of Quantitative Specimen Data using Machine Learning as a Service in the DiSSCo Research Infrastructure
expand article infoRajapreethi Rajendran, Claus Weiland, Jonas Grieb, Soulaine Theocharides§, Sam Leeflang§|, Wouter Addink|§, Sharif Islam§
‡ Senckenberg – Leibniz Institution for Biodiversity and Earth System Research, Frankfurt am Main, Germany§ Naturalis Biodiversity Center, Leiden, Netherlands| Distributed System of Scientific Collections - DiSSCo, Leiden, Netherlands¶ DiSSCo, Leiden, Netherlands
Open Access
Abstract

The Distributed System for Scientific Collections (DiSSCo) is a research infrastructure to integrate European natural science collections (NSCs) digitally. The aim is to facilitate and enhance the access, management and analysis of collection assets in one unified digital collection. The Machine Annotation Services (MAS) are essential components of DiSSCo’s Digital Specimen Architecture (DSArch). These services automate the annotation of digital objects to enable labeling and categorization of NSC's digital assets.

To further advance this, a Machine Learning as a Service (MLaaS) approach was developed which provides researchers with the access to pre-trained machine learning models for complex tasks such as instance segmentation and morphological analysis of datasets. MLaaS enhances the DiSSCo’s scalability and flexibility and allows the integration of machine learning tools in close alignment with the FAIR (Findable, Accessible, Interoperable, Reusable) principles.

This study employs DiSSCO's MLaaS framework for the quantitative analysis of herbarium specimens. Machine learning models such as Mask R-CNN and YOLO11 are comparatively applied to detect and generate the pixel-level masks of plant organs in herbarium sheets. Subsequently, these models are used to reconstruct the scale in the herbarium sheet and to calculate the surface area of identified plant organs.

Based on our finding that YOLO11 performs better than the Mask R-CNN for our use case, we deployed a YOLO11-based service as MAS in DSArch to open up natural science collections on scale for research fields such as plant phenology and climate change science.

Keywords
Keywords: Digital Specimen Architecture, plant organ detection, quantitative traits, deep learning, DiSSCo, image processing, instance segmentation, Mask R-CNN, YOLO11
login to comment