ARPHA Preprints, doi: 10.3897/arphapreprints.e176591
Automated Extraction of Fungal Trophic Modes from Literature Using BioBERT: An Open Pilot Workflow
expand article infoBeatrice Margareta Bock§
‡ Department of Biological Sciences, Northern Arizona University, Flagstaff, United States of America§ Center for Adaptable Western Landscapes, Northern Arizona University, Flagstaff, United States of America
Open Access
Abstract

Fungi exhibit diverse trophic strategies, ranging from obligate symbiosis to saprotrophy, with some taxa capable of occupying multiple ecological roles. Manually identifying trophic versatility from the literature is time-consuming and difficult to scale. Here, we present a pilot workflow that automates the classification of fungal trophic modes using BioBERT, a transformer-based language model applied to biomedical research papers. A curated dataset of 56 fungal ecology abstracts was manually labeled as dual (occupying multiple trophic modes) or solo (restricted to one mode) and used to fine-tune BioBERT for binary classification. The model achieved 86% accuracy with balanced precision and recall, demonstrating that machine learning can replicate literature-based trait assignments. This pilot study emphasizes reproducibility, transparency, and open data integration, offering a proof-of-concept for linking literature-derived ecological information to existing fungal trait databases such as FUNGuild and FungalTraits. All code, data, and trained models are openly available to support reuse and scaling to larger datasets.

Keywords
fungal ecology, trophic modes, natural language processing, machine learning, trait databases, BioBERT, saprotrophy-symbiosis continuum