APL HT18 and onwards, Language, Action, and Perception, 7.5 HEC, Språk, handling och perception, 7,5hp, part of Doctoral Degree in Computational Linguistics.
This is PhD course that explores computational modelling of language and vision in particular in relation to situated dialogue agents and image classification. There is a parallel course at the masters level which this course may partially overlap with: LT2308 ESLP: Embodied and Situated Language Processing or LT2318: Artificial Intelligence: Cognitive Systems.
The course gives a survey of theory and practical computational implementations of how natural language interacts with the physical world through action and perception. We will look at topics such as semantic theories and computational approaches to modelling natural language, action and perception (grounding), situated dialogue systems, integrated robotic systems, grounding of language in action and perception, generation and interpretation of scene descriptions from images and videos, spatial cognition, and others.
As the course studies how humans structure and interact with the physical world and express it in language, it bridges into the domains of cognitive science, computer vision, robotics and therefore more broadly belongs to the field of cognitive artificial intelligence. Typical applications of computational models of language, action, and perception are image search and retrieval on the web, navigation systems that provide more natural, human-like instructions, and personal robots and situated conversational agents that interact with us in our home environment through language.
The learning outcomes of the course are based on covering 3 topics: (i) the relation between language and perception in human interaction, (ii) how language and perception is modelled with formal and computational models and methods and how these are integrated with different applications, and (iii) how research in the field is communicated scientifically.
- General admission requirements for a doctoral degree in Computational Linguistics or equivalent.
In order to follow the course, the participants should at least have experience with one or several of the following fields at masters level:
- Formal semantics and pragmatics
- Natural language processing
- Computational semantics
- Machine learning
- or equivalent skills and knowledge.
Please read this document and talk to Simon.
- Simon Dobnik (course organiser), office hours: by appointment
For a list of suggested readings please see here. Individual readings will be suggested for each meeting.
Schedule and course materials
Contextual referrring expressions
- 2020-06-12, Zoom
- Pezzelle, S., & Fernández, R. (2019). Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts. arXiv preprint arXiv:1908.10285. (recommended by Staffan) 2020-06-12
- Staffan (presenter), Tewodros, Maryam, Mehdi, Simon, and Robin
Visual question answering and background knowledge
- 2020-05-29, Zoom
- Talk: Míriam Sánchez-Alcón: The significance of applying attention to Visual Question Answering
- Wu, J., & Mooney, R. J. (2018). Faithful Multimodal Explanation for Visual Question Answering [cs.CL], 2020. (recommended by Simon) 2020-05-29
- Nikolai (presenter), Miriam (presenter), Tewodros, Robin, Staffan, Simon
Word complexity and concreteness, requirements for social and embodied NLP
- 2020-04-30, Zoom
- Talk: David Alfter: Visual features in textual complexity classification: a case study on pictograms
- Y. Bisk, A. Holtzman, J. Thomason, J. Andreas, Y. Bengio, J. Chai, M. Lapata, A. Lazaridou, J. May, A. Nisnevich, N. Pinto, and J. Turian. Experience grounds language. arXiv, arXiv:2004.10151 [cs.CL], 2020. 2020-04-30
- Robin, Staffan, Mehdi, Nikolai, Bill, Vlad, Tewodros, Maryam, David, Elena, Simon
Generating image descriptions, natural language generation
- 2020-04-17, Zoom
- J. Krause, J. Johnson, R. Krishna, and L. Fei-Fei. A hierarchical approach for generating descriptive image paragraphs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3337–3345, July 21–26 2017.
- Nikolai (presenter), Mehdi, Robin, Vlad, Bill, Aram, Maryam, and Simon
Generating image descriptions and pragmatics
- 2020-03-20, Zoom
- Cohn-Gordon, R., Goodman, N., & Potts, C. (2018). Pragmatically Informative Image Captioning with Character-Level Inference.
- Nikolai (presenter), Mehdi, Robin, Vlad, Bill, Tewodros and Simon (check)
Spatial representations, representation learning, interpretability
- 2019-02-08 10-12 Dicksonsgatan 4
- G. Collell, L. V. Gool, and M. Moens. Acquiring common sense spatial knowledge through implicit spatial templates. arXiv, arXiv:1711.06821 [cs.AI]:1–8, 2017.
- Mehdi (presenter), Felix, Vlad, Robin, Staffan, Simon
Language and action
- 2019-03-08 10-12 Dicksonsgatan 4
- Forestier S, Oudeyer P-Y. (2017) A Unified Model of Speech and Tool Use Early Development. Proceedings of the 39th Annual Meeting of the Cognitive Science Society.
- Sylvie (presenter), Felix, Mehdi, Bill, Robin, Stergios, Simon
You can find an earlier version of this webpage here.