Language, Action, and Perception (APL)

APL HT18 and onwards, Language, Action, and Perception, 7.5 HEC, Språk, handling och perception, 7,5hp, part of Doctoral Degree in Computational Linguistics.

This is PhD course that explores computational modelling of language and vision in particular in relation to situated dialogue agents and image classification. There is a parallel course at the masters level which this course may partially overlap with: LT2308 ESLP: Embodied and Situated Language Processing or LT2318: Artificial Intelligence: Cognitive Systems.

The course gives a survey of theory and practical computational implementations of how natural language interacts with the physical world through action and perception. We will look at topics such as semantic theories and computational approaches to modelling natural language, action and perception (grounding), situated dialogue systems, integrated robotic systems, grounding of language in action and perception, generation and interpretation of scene descriptions from images and videos, spatial cognition, and others.

As the course studies how humans structure and interact with the physical world and express it in language, it bridges into the domains of cognitive science, computer vision, robotics and therefore more broadly belongs to the field of cognitive artificial intelligence. Typical applications of computational models of language, action, and perception are image search and retrieval on the web, navigation systems that provide more natural, human-like instructions, and personal robots and situated conversational agents that interact with us in our home environment through language.

The learning outcomes of the course are based on covering 3 topics: (i) the relation between language and perception in human interaction, (ii) how language and perception is modelled with formal and computational models and methods and how these are integrated with different applications, and (iii) how research in the field is communicated scientifically.

Course prerequisites:

General admission requirements for a doctoral degree in Computational Linguistics or equivalent.

In order to follow the course, the participants should at least have experience with one or several of the following fields at masters level:

Formal semantics and pragmatics
Natural language processing
Computational semantics
Machine learning
Robotics
or equivalent skills and knowledge.

Course syllabus

In Swedish

Requirements

Please read this document and talk to Simon.

Lecturers

Simon Dobnik (course organiser), office hours: by appointment

Course literature

For a list of suggested readings please see here. Individual readings will be suggested for each meeting.

Schedule and course materials

Integrating symbolic common-sense knowledge * 2021-01-29, Zoom
- +++++++++ J. D. Hwang, C. Bhagavatula, R. L. Bras, J. Da, K. Sakaguchi, A. Bosselut, and Y. Choi. Comet-atomic 2020: On symbolic and neural commonsense knowledge graphs. arXiv, arXiv:2010.05953 [cs.CL]:1–17, 2020. Video of a talk. (Recommended by Nikolai, credit for either APL or ROM, would like to read: Nikolai, Anna, Felix, Axel, Robin, Staffan, Ellen, Simon) 2021-01-29
- ROM: Anna, Felix, Axel, APL: Nikolai
Grounded neural language models and VQA
- 2020-12-04, Zoom
- T. Scialom, P. Bordes, P.-A. Dray, J. Staiano, and P. Gallinari. What BERT sees: Cross-modal transfer for visual question generation. arXiv, arXiv:2002.10832 [cs.CL]:1–11, November 2 2020. (recommended by Nikolai), 2020-12-04
- Nikolai (presenter), Eleni, Maryam, and Simon
Situated comminication with robots
- 2020-10-02, Zoom
- J. Y. Chai, Q. Gao, L. She, S. Yang, S. Saba-Sadiya, and G. Xu. Language to action: Towards interactive task learning with physical agents. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 2–9. International Joint Conferences on Artificial Intelligence Organization, 7 2018. 2020-10-02 Talk, workshop with the talk
- Nikolai, Vidya, Vlad, Robin, and Simon (presenter)
Neuro-symbolic representations of affordances and actions
- 2020-09-21, Zoom
- J. Pustejovsky and N. Krishnaswamy. Situated meaning in multimodal dialogue: Human-robot and human-computer interactions. Journal article manuscript, Department of Computer Science, Brandeis University, July 2020. (recommneded by Bill and Nikolai) 2020-09-21 Talk
- Bill (presenter), Nikolai (presenter), Vlad, Vidya, Robin, and Simon
Cognitive representations of actions
- 2020-09-04, Zoom
- A. Knott and M. Takac. Roles for event representations in sensorimotor experience, memory formation, and language processing. Topics in Cognitive Science, 2020. (local copy) (recommended by Robin) 2020-09-04
- Robin (presenter), Vidya, Staffan, and Simon
Contextual referrring expressions
- 2020-06-12, Zoom
- Pezzelle, S., & Fernández, R. (2019). Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts. arXiv preprint arXiv:1908.10285. (recommended by Staffan) 2020-06-12
- Staffan (presenter), Tewodros, Maryam, Mehdi, Simon, and Robin
Visual question answering and background knowledge
- 2020-05-29, Zoom
- Talk: Míriam Sánchez-Alcón: The significance of applying attention to Visual Question Answering
- Wu, J., & Mooney, R. J. (2018). Faithful Multimodal Explanation for Visual Question Answering [cs.CL], 2020. (recommended by Simon) 2020-05-29
- Nikolai (presenter), Miriam (presenter), Tewodros, Robin, Staffan, Simon
Word complexity and concreteness, requirements for social and embodied NLP
- 2020-04-30, Zoom
- Talk: David Alfter: Visual features in textual complexity classification: a case study on pictograms
- Y. Bisk, A. Holtzman, J. Thomason, J. Andreas, Y. Bengio, J. Chai, M. Lapata, A. Lazaridou, J. May, A. Nisnevich, N. Pinto, and J. Turian. Experience grounds language. arXiv, arXiv:2004.10151 [cs.CL], 2020. 2020-04-30
- Robin, Staffan, Mehdi, Nikolai, Bill, Vlad, Tewodros, Maryam, David, Elena, Simon
Generating image descriptions, natural language generation
- 2020-04-17, Zoom
- J. Krause, J. Johnson, R. Krishna, and L. Fei-Fei. A hierarchical approach for generating descriptive image paragraphs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3337–3345, July 21–26 2017.
- Nikolai (presenter), Mehdi, Robin, Vlad, Bill, Aram, Maryam, and Simon
Generating image descriptions and pragmatics
- 2020-03-20, Zoom
- Cohn-Gordon, R., Goodman, N., & Potts, C. (2018). Pragmatically Informative Image Captioning with Character-Level Inference.
- Nikolai (presenter), Mehdi, Robin, Vlad, Bill, Tewodros and Simon (check)
Spatial representations, representation learning, interpretability
- 2019-02-08 10-12 Dicksonsgatan 4
- G. Collell, L. V. Gool, and M. Moens. Acquiring common sense spatial knowledge through implicit spatial templates. arXiv, arXiv:1711.06821 [cs.AI]:1–8, 2017.
- Mehdi (presenter), Felix, Vlad, Robin, Staffan, Simon
Language and action
- 2019-03-08 10-12 Dicksonsgatan 4
- Forestier S, Oudeyer P-Y. (2017) A Unified Model of Speech and Tool Use Early Development. Proceedings of the 39th Annual Meeting of the Cognitive Science Society.
- Sylvie (presenter), Felix, Mehdi, Bill, Robin, Stergios, Simon

You can find an earlier version of this webpage here.