The situated language and perception reading group meets on even Fridays 10-12 in the seminar room on the 5th floor of Humanistiska fakulteten, Renströmsgatan 6.

Sometimes, and more recently we meet online on Zoom, requires GU-login.

From here you can also:

  • add your paper suggestions
  • add a paper
  • when adding a paper, please use a link to the published (e.g. ACL) version rather than arXiV if the former exists


  • V&L paper


Please add here any papers (with links) you would like to suggest for the reading group.

  • M. Artetxe, G. Labaka, and E. Agirre. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers), pages 451–462, Vancouver, Canada, July 2017. Association for Computational Linguistics. (comes with a video and slides)
  • M. Artetxe, G. Labaka, and E. Agirre. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 789–798, Melbourne, Australia, July 2018. Association for Computational Linguistics. (comes with a video and slides)
  • Sellam, T., Das, D., & Parikh, A. P. (2020). BLEURT: Learning Robust Metrics for Text Generation. (recommended by Nikolai)
  • Tan, H., & Bansal, M. (2019). LXMERT: Learning Cross-Modality Encoder Representations from Transformers. (recommended by Simon)
  • Wu, J., & Mooney, R. J. (2019). Self-Critical Reasoning for Robust Visual Question Answering. (recommended by Simon)
  • Joyce Y. Chai, Rui Fang, Changsong Liu, and Lanbo She. 2017. Collaborative language grounding to- ward situated human-robot dialogue. AI Magazine, 37(4):32–45.
  • Joyce Y. Chai, Qiaozi Gao, Lanbo She, Shaohua Yang, Sari Saba-Sadiya, and Guangyue Xu. 2018. Lan- guage to action: Towards interactive task learning with physical agents. In Proceedings of the Twenty- Seventh International Joint Conference on Artificial Intelligence (IJCAI-18).
  • Akbik, Alan & Blythe, Duncan & Vollgraf, Roland. (2018). Contextual String Embeddings for Sequence Labeling. paper (recommended by Axel)
  • Nguyen, Phi & Joty, Shafiq & Hoi, Steven & Socher, Richard. (2020). Tree-structured Attention with Hierarchical Accumulation. paper (recommended by Axel)
  • Wang, Bin & Chen, Fenxiao & Wang, Yuncheng & Kuo, C.. (2020). Efficient Sentence Embedding via Semantic Subspace Analysis. paper (recommended by Axel)
  • Goodman, N. D., & Stuhlmüller, A. (2013). Knowledge and Implicature: Modeling Language Understanding as Social Cognition. Topics in Cognitive Science. paper (recommended by Bill)
  • Tan, H., Dernoncourt, F., Lin, Z., Bui, T., & Bansal, M. (2019). Expressing Visual Relationships via Language. paper (recommended by Nikolai)
  • Thomason, J., Padmakumar, A., Sinapov, J., Walker, N., Jiang, Y., Yedidsion, H., ... & Mooney, R. J. (2020). Jointly improving parsing and perception for natural language commands through human-robot dialog. Journal of Artificial Intelligence Research, 67, 1-48. paper (recommended by Mehdi)
  • Moro, D., Black, S., & Kennington, C. (2019). Composing and Embedding the Words-as-Classifiers Model of Grounded Semantics. arXiv preprint arXiv:1911.03283. ( (recommended by Staffan)
  • Mollica, F. et al. (2019). Composition is the core driver of the language-selective network Paper (recommended by Mehdi)
  • Malt, B. C., Sloman, S. A., Gennari, S., Shi, M., & Wang, Y. (1999). Knowing versus naming: Similarity and the linguistic categorization of artifacts. Journal of Memory and Language, 40(2), 230-262. paper (recommended by Staffan)
  • Herbelot, A., & Vecchi, E. M. (2015). Building a shared world: Mapping distributional to model-theoretic semantic spaces. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 22-32).( (recommended by Staffan)
  • Marcus, G. (2018). Deep learning: A critical appraisal. paper; video comments; (recommended by Mehdi)
  • J. A. Bateman, M. Pomarlan, and G. Kazhoyan. Embodied contextualization: Towards a multistratal ontological treatment. Applied Ontology, Pre-press:1–35, 2 October 2019. paper
  • J. Thomason, M. Murray, M. Cakmak, and L. Zettlemoyer. Vision-and-dialog navigation. In Conference on Robot Learning (CoRL), 2019. paper (recommended by Simon)
  • What are the differences between neural networks and the brain? panel discussion from Center for Brains, Minds and Machines (CBMM) (recommended by Mehdi)
  • W. N. Havard, J.-P. Chevrot, and L. Besacier. Models of visually grounded speech signal pay attention to nouns: a bilingual experiment on english and japanese. arXic, arXiv:1902.03052 [cs.CL]:1–5, 2019. paper (recommended by Sylvie)
  • L. Arras, F. Horn, G. Montavon, K.-R. Müller, and W. Samek. ”What is relevant in a text document?”: An interpretable machine learning approach. PLOS ONE, 12(8):1–23, 08 2017. paper (recommended by Felix)
  • M. Janner, K. Narasimhan, and R. Barzilay. Representation learning for grounded spatial reasoning. Transactions of the Association for Computational Linguistics, 6:49–61, 2018. (recommended by Mehdi) link
  • Yatskar, M., Zettlemoyer, L., & Farhadi, A. (2016). Situation recognition: Visual semantic role labeling for image understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5534-5542). link (recommended by Mehdi)
  • Mei, H., Bansal, M., & Walter, M. R. (2016, February). Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences. In AAAI (pp. 2772-2778). link (recommended by Mehdi)
  • J. Zwarts and Y. Winter. Vector space semantics: A model-theoretic analysis of locative prepositions. Journal of Logic, Language and Information, 9:169–211, 2000. (recommended by all)
  • one of the papers on this page (Oxford robotics & vision group): link (recommended by Staffan)
  • Ben-Yosef, G., Assif, L., & Ullman, S. (2018). Full interpretation of minimal images. Cognition, 171, 65-84. link video (recommended by Mehdi)


  • Parizi, A. H., & Cook, P. (2020). Evaluating Sub-word embeddings in cross-lingual models. Proceedings ofthe 12th Conference on Language Resources and Evaluation (LREC 2020), May, 2712–2719. (recommended by Tewodros) 2020-06-26
  • Pezzelle, S., & Fernández, R. (2019). Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts. arXiv preprint arXiv:1908.10285. paper (recommended by Staffan) 2020-06-12
  • Talk: Míriam Sánchez-Alcón: The significance of applying attention to Visual Question Answering  paper and Wu, J., & Mooney, R. J. (2018). Faithful Multimodal Explanation for Visual Question Answering [cs.CL], 2020. paper (recommended by Simon) 2020-05-29
  • Goodman, N. D., & Frank, M. C. (2016). Pragmatic Language Interpretation as Probabilistic Inference. In Trends in Cognitive Sciences. paper (recommended by Bill) 2020-05-15
  • Talk: David Alfter: Visual features in textual complexity classification: a case study on pictograms  paper and Y. Bisk, A. Holtzman, J. Thomason, J. Andreas, Y. Bengio, J. Chai, M. Lapata, A. Lazaridou, J. May, A. Nisnevich, N. Pinto, and J. Turian. Experience grounds language. arXiv, arXiv:2004.10151 [cs.CL], 2020. paper 2020-04-30
  • J. Krause, J. Johnson, R. Krishna, and L. Fei-Fei. A hierarchical approach for generating descriptive image paragraphs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3337–3345, July 21–26 2017. paper (recommended by Nikolai)
  • Tai, Kai & Socher, Richard & Manning, Christoper. (2015). Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. 1. 10.3115/v1/P15-1150. paper (recommended by Axel) 2020-04-03
  • Cohn-Gordon, R., Goodman, N., & Potts, C. (2018). Pragmatically Informative Image Captioning with Character-Level Inference. paper (recommended by Nikolai) 2020-03-20
  • Anonymous (2020), Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. paper (recommended by Mehdi) 2020-03-06
  • X. Yu, H. Zhang, Y. Song, Y. Song, and C. Zhang. What you see is what you get: Visual pronoun coreference resolution in dialogues. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5122–5131, Hong Kong, China, Nov. 2019. Association for Computational Linguistics. paper (recommended by Sharid and Simon) 2019-12-13
  • Research talk by Vaishnavi Annavarjula 2019-12-02
  • F. Cavicchio, D. Melcher, and M. Poesio. The effect of linguistic and visual salience in visual world studies. Frontiers in Psychology, 5:176, 2014. paper (recommended by Sharid and Simon) 2019-11-15
  • S. Kottur, J. M. Moura, D. Parikh, D. Batra, and M. Rohrbach. Visual coreference resolution in visual dialog using neural module networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 153–169, 2018. (recommended by Sharid and Simon) link 2019-10-04
  • J. M. Cano Sant ́ın. Fast visual grounding in interaction: bringing few-shot learning with neural networks to an interactive robot. Masters in language technology (mlt), 30 hec, Department of Philosophy, Lin- guistics and Theory of Science (FLOV), University of Gothenburg, Gothenburg, Sweden, September 18 2019. Supervisor: Simon Dobnik and Mehdi Ghanimifard, examiner: Aarne Ranta. 2019-09-18
  • Peters, Matthew E., et al. "Dissecting contextual word embeddings: Architecture and representation." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1499–1509 Brussels, Belgium, October 31 - November 4, 2018. link (recommended by Felix) 2019-05-03
  • Pragst, Louisa, et al. “On the Vector Representation of Utterances in Dialogue Context.” Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), European Language Resource Association, 2018. link (recommended by Bill Nobel) 2019-04-05
  • Forestier S, Oudeyer P-Y. (2017) A Unified Model of Speech and Tool Use Early Development. Proceedings of the 39th Annual Meeting of the Cognitive Science Society. link (recommended by Sylvie Saget) 2019-03-08
  • Li, J., Chen, X., Hovy, E., & Jurafsky, D. (2016). Visualizing and Understanding Neural Models in NLP. In Proceedings of NAACL-HLT (pp. 681-691). link (recommended by Felix Morger) 2019-02-22
  • G. Collell, L. V. Gool, and M. Moens. Acquiring common sense spatial knowledge through implicit spatial templates. arXiv, arXiv:1711.06821 [cs.AI]:1–8, 2017. link 2019-02-08
  • N. Schneider, J. D. Hwang, V. Srikumar, J. Prange, A. Blodgett, S. R. Moeller, A. Stern, A. Bitan, and O. Abend. Comprehensive supersense disambiguation of English prepositions and possessives. arXiv, arXiv:1805.04905 [cs.CL], 2018. (recommended by Bill) 2018-12-06
  • B. Landau and R. Jackendoff. “what” and “where” in spatial language and spatial cognition. Behavioral and Brain Sciences, 16(2):217–238, 255–265, 1993. Background: B. Landau. Update on “what” and “where” in spatial language: A new division of labor for spatial terms. Cognitive Science, 41(2):321–350, 2016. (recommended by Mehdi) 2018-11-22
  • A. Conneau, G. Kruszewski, G. Lample, L. Barrault, and M. Baroni. What you can cram into a single vector: Probing sentence embeddings for linguistic properties. arXiv, arXiv:1805.01070 [cs.CL], 2018. (recommended by Bill) link 2018-11-02
  • W. Monroe, R. X. D. Hawkins, N. D. Goodman, and C. Potts. Colors in context: A pragmatic neural model for grounded language understanding. Transactions of the Association for Computational Linguistics, 5:325–338, 2017. 2018-10-15 (recommended by Simon and Mehdi) link
  • I. Vulić and N. Mrkšić. Specialising word vectors for lexical entailment. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1134–1145. Association for Computational Linguistics, 2018 (recommended by Bill) 2018-10-08 link
  • ACL 2018 report by Mehdi Ghanimifard 2018-09-21: link
  • Matteo Mossio and Dario Taraborelli. Action-dependent perceptual invariants: From ecological to sensorimotor approaches. Consciousness and cognition, 17(4):1324-1340, 2008. (recommended by Sylvie) 2018-06-01
  • S. C. Marsella and J. Gratch. Ema: A process model of appraisal dynamics. Cognitive Systems Research, 10(1):70–90, 2009. (recommended by Vlad) 2018-05-04
  • Viethen, Jette, Robert Dale, and Markus Guhe. "Generating subsequent reference in shared visual scenes: Computation vs. re-use." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011. link (recommended by Sylvie) 2018-03-23
  • Stefanie Tellex: Learning Models of Language, Action and Perception for Human-Robot Collaboration video 2018-03-16
  • J. Y. Chai, R. Fang, C. Liu, and L. She. Collaborative language grounding toward situated human-robot dialogue. AI Magazine, 37(4), 2016. (recommended by Mehdi and Simon) 2018-02-09
  • J. Pustejovsky. From affordances to events: Communicating action through language and gesture. Paper manuscript, Department of Computer Science, Brandeis University, Waltham, MA USA, January 2018. (recommended by Robin)
  • Fodor, J. (1998). There are no recognitional concepts; not even RED. Philosophical issues, 9, 1-14. (recommended by Staffan) (2018-01-12)
  • J. Johnson, B. Hariharan, L. van der Maaten, J. Hoffman, F. Li, C. L. Zitnick, and R. B. Girshick. Inferring and executing programs for visual reasoning. CoRR, abs/1705.03633(n):n, 2017. link (recommended by Mehdi) 2017-12-08
  • A. Lücking. Modeling co-verbal gesture perception in type theory with records. In Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on, pages 383–392. IEEE, 2016. (recommended by Vlad)
  • M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In Proceedings of the IEEE International Conference on Computer Vision, pages 1–9, 2015. link (recommended by Simon) 2017-10-27
  • H. M. Hersh and A. Caramazza. A fuzzy set approach to modifiers and vagueness in natural language. Journal of Experimental Psychology: General, 105(3):254, 1976. link (recommended by Staffan) 2017-10-13
  • J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Learning to compose neural networks for question answering. CoRR, abs/1601.01705:1–10, 2016. link (recommended by Mehdi), 2017-09-29


2017-02-14: Mehdi: Learning to compose spatial relations with grounded recurrent neural language models
  • Mehdi, Haris, Stegrios, Jean-Philippe, Chris, Robin, Yuri and Simon
  • Several interesting papers to read: list of cited papers
2017-02-06: First, kick-off meeting
  • Mehdi, Haris, Chris, Robin and Simon
  • Areas of language and perception