Goal

The goal of this project is to model reference and co-reference resolution in visual dialogue by building upon the existing work on situated dialogue and the document/textual domain in this area.

Background

Situated dialogue involves language and vision. An important aspect of processing situated dialogue is to resolve the reference of linguistic expressions. The challenging aspect is that descriptions are local to the current dialogue and visual context of the conversation and that not all information is expressed linguistically as a lot of meaning can be recovered from the joint visual and dialogue attention.

Co-reference resolution has been studied and modelled extensively in the textual domain where the scope of the processing co-reference is within a document. Robust co-reference resolution for dialogue systems is a very much needed task but using standard textual co-reference tools in this domain presents several challenges and open questions.

Problem description

  • T1. Extend the Cups corpus [4] by collecting data on Mechanical Turk.
  • T2. Annotate the corpus with co-reference chains either (a) manually or (b) through Mechanical Turk.
  • T3. Depending on the previous steps, for (a) do a corpus and statistical feature analysis of the annotations in relation to their predictability of co-reference, or for (b) train a supervised text and vision classification model for reference and co-reference resolution.
  • T4. Compare reference and co-reference startegies between different corpora.

See S. Dobnik, N. Ilinykh, and A. Karimi. What to refer to and when? reference and re-reference in two language-and-vision tasks. In E. Gregoromichelaki, J. Hough, and J. D. Kelleher, editors, Proceedings of DubDial - Semdial 2022: The 26th Workshop on the Semantics and Pragmatics of Dialogue, Proceedings (SemDial), pages 146–159, Dublin, Ireland, August 22–24 2022.

image

N. Ilinykh, S. Zarrieß, and D. Schlangen. Tell me more: A dataset of visual scene description sequences. In Proceedings of the 12th International Conference on Natural Language Generation, pages 152–157, Tokyo, Japan, Oct.–Nov. 2019. Association for Computational Linguistics.

image

For students in Masters in Language Technology (MLT), the master thesis would build on the following courses

  • Computational semantics
  • Dialogue systems
  • Machine learning
  • AI: Cognitive systems

or equivalent knowledge and skills for external students.

A potential candidate should be comfortable in programming with Python, machine learning and text and image processing tools and techniques.

Supervisors

Simon Dobnik, Nikolai Ilinykh, and Aram Karimi