Corresponding author: Beatrice Bock ( beabockm@gmail.com ) © Beatrice Bock. This is an open access preprint distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Citation:
Bock B (2025) Automated Extraction of Fungal Trophic Modes from Literature Using BioBERT: An Open Pilot Workflow. ARPHA Preprints. https://doi.org/10.3897/arphapreprints.e176591 |
Fungi exhibit diverse trophic strategies, ranging from obligate symbiosis to saprotrophy, with some taxa capable of occupying multiple ecological roles. Manually identifying trophic versatility from the literature is time-consuming and difficult to scale. Here, we present a pilot workflow that automates the classification of fungal trophic modes using BioBERT, a transformer-based language model applied to biomedical research papers. A curated dataset of 56 fungal ecology abstracts was manually labeled as dual (occupying multiple trophic modes) or solo (restricted to one mode) and used to fine-tune BioBERT for binary classification. The model achieved 86% accuracy with balanced precision and recall, demonstrating that machine learning can replicate literature-based trait assignments. This pilot study emphasizes reproducibility, transparency, and open data integration, offering a proof-of-concept for linking literature-derived ecological information to existing fungal trait databases such as FUNGuild and FungalTraits. All code, data, and trained models are openly available to support reuse and scaling to larger datasets.