Comprehensively Evaluating Language in Language Models

Event: Seminar
Presented by: Leonie Weissweiler from Uppsala University
Date: 04 February 2026
Time: 13:15-15:00
Venue: Gothenburg University, Humanisten and online
Address: Renströmsgatan 6, 412 55 Göteborg
Room: J336
Zoom link: https://gu-se.zoom.us/j/67063108947?pwd=kPpjvMLCekxNTBVzq4uYP5gFZ6Y6vd.1
Slides:

Abstract

As Large Language Models (LLMs) are being increasingly used in high-stakes situations, it is vital that we accurately assess their strengths, but also their limitations. To this end, I ask: how can we ensure that we neither over- nor underestimate Language Models’ linguistic capabilities? For this, evaluations must consider the full breadth of human language. In my talk, I will demonstrate how progress can be made towards this goal in two aspects: multilingual evaluation, and evaluation for the long tail of language. For multilingual evaluation, I will show how agreement evaluation can be scaled to over 100 languages. For the long tail of language, I will report results from two investigations of language models’ understanding of the so-that construction, with which even state-of-the-art models struggle, even though rich distributional information is available in their training data. I will further demonstrate how LLMs themselves can be leveraged to annotate corpora for long-tail constructions. This will further stretch the boundaries of what we are able to test. All evaluations together paint a nuanced picture of the linguistic capabilities of large language models, showing achievements as well as remaining deficits.