This is a PhD course that explores computational modelling of language and vision in particular in relation to situated dialogue agents and image classification. There is a parallel course at the master’s level which this course may partially overlap with: LT2308 ESLP: Embodied and Situated Language Processing or LT2318: Artificial Intelligence: Cognitive Systems.

The course gives a survey of theory and practical computational implementations of how natural language interacts with the physical world through action and perception. We will look at topics such as semantic theories and computational approaches to modelling natural language, action and perception (grounding), situated dialogue systems, integrated robotic systems, grounding of language in action and perception, generation and interpretation of scene descriptions from images and videos, spatial cognition, and others.

As the course studies how humans structure and interact with the physical world and express it in language, it bridges into the domains of cognitive science, computer vision, robotics and therefore more broadly belongs to the field of cognitive artificial intelligence. Typical applications of computational models of language, action, and perception are image search and retrieval on the web, navigation systems that provide more natural, human-like instructions, and personal robots and situated conversational agents that interact with us in our home environment through language.

The learning outcomes of the course are based on covering 3 topics: (i) the relation between language and perception in human interaction, (ii) how language and perception is modelled with formal and computational models and methods and how these are integrated with different applications, and (iii) how research in the field is communicated scientifically.

The course webpage can be accessed here.

The course syllabus can be found here.

Language, Action, and Perception