Beyond word clouds – NLP applications in challenging cultural contexts

Event: Seminar
Presented by: Jana Götze from Staatsbibliothek zu Berlin
Date: 10 June 2026
Time: 13:15-15:00
Venue: Gothenburg University, Humanisten and online
Address: Renströmsgatan 6, 412 55 Göteborg
Room: J335
Zoom link: https://gu-se.zoom.us/j/67063108947?pwd=kPpjvMLCekxNTBVzq4uYP5gFZ6Y6vd.1
Slides:

Abstract

In recent years we have seen rapid improvement of ML models on all levels of language (and related) processing. These models are typically developed using a wide range of benchmark datasets that rarely replicate the conditions that are relevant in a library context: noisy OCR, diachronic language variation, and heterogeneous historical documents. NLP research and development in libraries includes tasks like named entity recognition, entity linking, semantic search and summarization and presents both opportunities and challenges that can help us understand the limits of current models. This talk gives an overview of the data, processing challenges, and ongoing research at Staatsbibliothek zu Berlin that serves to make its data available to library users and humanities researchers as more than just text (or word clouds).