ARPHA Preprints, doi: 10.3897/arphapreprints.e113335
Developing responsible AI practices at the Smithsonian Institution
expand article infoRebecca Dikow, Corey DiPietro§, Michael G Trizna, Hanna BredenbeckCorp§, Madeline G Bursell|, Jenna T B Ekwealor, Richard G J Hodel#, Nilda Lopez¤, William J B Mattingly, Jeremy Munro«, Richard M Naples¤, Candace Oubre», Drew Robarge§, Sara Snyder˄, Jennifer L Spillane, Melinda J Tomerlin˅, Luis J Villanueva¦, Alexander E White
‡ Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, United States of America§ National Museum of American History, Smithsonian Institution, Washington, DC, United States of America| Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States of America¶ Department of Biology, San Francisco State University, San Francicso, CA, United States of America# National Museum of Natural History, Smithsonian Institution, Washington, DC, United States of America¤ Smithsonian Libraries and Archives, Smithsonian Institution, Washington, DC, United States of America« National Air and Space Museum, Smithsonian Institution, Washington, DC, United States of America» National Museum of African American History and Culture, Smithsonian Institution, Washington, DC, United States of America˄ Office of Digital Transformation, Smithsonian Institution, Washington, DC, United States of America˅ National Museum of Asian Art, Smithsonian Institution, Washington, DC, United States of America¦ Digitization Program Office, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, United States of America
Open Access
Abstract

Applications of artificial intelligence (AI) and machine learning (ML) have become pervasive in our everyday lives. These applications range from the mundane (asking ChatGPT to write a thank you note) to high-end science (predicting future weather patterns in the face of climate change), but because they rely on human-generated or mediated data, they also have the potential to perpetuate systemic oppression and racism. For museums and other cultural heritage institutions, there is great interest in automating the kinds of applications at which AI and ML can excel, e.g., tasks in computer vision including image segmentation, object recognition (labeling or identifying objects in an image), and natural language processing (e.g. named-entity recognition, topic modeling, generation of word and sentence embeddings) in order to make digital collections and archives discoverable, searchable, and appropriately tagged.

A coalition of staff, fellows, and interns working in digital spaces at the Smithsonian Institution, who are either engaged with research using AI or ML tools, or working closely with digital data in other ways, came together to discuss the promise and potential peril of applying AI and ML at scale and this work results from those conversations. Here we present the process that has led to the development of an AI Values Statement and an implementation plan, including the release of datasets with accompanying documentation to enable these data to be used with improved context and reproducibility (dataset cards). We plan to continue releasing dataset cards, and for AI and ML applications, model cards, in order to enable informed usage of Smithsonian data and research products.

Keywords
artificial intelligence, machine learning, GLAM, galleries, libraries, archives, museums, collections