CLASP
The Centre for Linguistic Theory and Studies in Probability

Discourse models with language models

Abstract

How are sentences in a document connected, and why do they make the document feel “coherent”? Computational models of discourse aim to solve this myth by recovering the structural organization of texts, through which writers convey intent and meaning. In the first part of this talk, I will discuss our efforts on modeling human curiosity through question generation, and understanding its connection with discourse representations based on the linguistic theory of Questions Under Discussion. We show that LLMs, with design and training, resurface curiosity-driven questions and ground their elicitation and answers in text. Next, I will demonstrate how such generative discourse models can be used to measure discourse similarities in LLM-generated texts, as well as to derive explainable measures of information salience in LLMs using summarization as a behavioral probe.