ARPHA Preprints, doi: 10.3897/arphapreprints.e107166
Deliverable D11.2 Search and link association services: A RESTful API, which will input a link/accession number and return a ranked list of neighbours links with a confidence score
expand article infoSoulaine Theocharides, Niki Kyriakopoulou
‡ Naturalis Biodiversity Center, Leiden, Netherlands
Open Access
Abstract

Work package 11 of the BiCIKL project involves developing software tools to support a FAIR experience for members of the biodiversity research community. The package overall focuses on Findability, by providing tools to search and answer questions, and Accessibility, through developing links across various biodiversity data sources and research tools. Task 11.2 specifically involves prediction of new links using machine learning. We chose to demonstrate the functionality of machine learning link prediction with plant-pollinator interactions. This type of interaction was chosen due to the wealth of data available, particularly on the Global Biotic Interactions (GloBI) database, as well as this kind of interaction’s ecological and economic significance. The result was a RESTful API capable of predicting plant-pollinator interactions among a predefined set of species. Predictions are made on-the-fly, at the time of the request. The GitHub repository for the API can be found here: https://github.com/DiSSCo/BiCIKL_Linkages_API

The API takes either a plant or a pollinator as inputs, and outputs potential matches based on a user-defined confidence score. The API’s prediction is powered by a random forest classifier stored on disk. The classifier was trained on the taxonomic hierarchy of observed plant-pollinator pairs obtained from the GloBI database. When evaluating the likelihood of an interaction, the trained classifier looks at the taxonomic hierarchy of both the plant and pollinator and outputs a confidence score. What pairs are returned is determined by the minimum confidence score set by the user.

Keywords
BiCIKL, RESTful API