CLASP
The Centre for Linguistic Theory and Studies in Probability

Understanding and Modelling Pronouns in Translation: Resources, Methods, Challenges and Insights

Abstract The difficulty of pronoun translation is typically illustrated with examples of anaphoric pronouns requiring gender agreement in the target language. However, pronoun translation is more complex than that. In this talk, I present our efforts to understand and model the generation and interpretation of pronouns in translation. A core resource is the ParCorFull corpus, a multilingual parallel dataset with a rich annotation of coreferential phenomena going beyond simple anaphoric references. ParCorFull has found a range of applications to the cross-lingual study of texts, to machine translation evaluation, leading to insights into translation processes, but also uncovering challenges due to how corpus annotation resolves ambiguity, potentially creating conflicts in a parallel data. Additional insights can be gained from studies of pronoun generation and interpretation we’ve conducted with human participants, highlighting the variance of typical patterns across five European languages. I also present our work on modelling pronoun translation in the context of cross-lingual coreference resolution and neural machine translation with the help of cross-lingual mention attention, resulting in consistent, but rather modest performance gains. If time permits, I may also talk a bit about our more recent work on evidential deep learning for uncertainty estimation.