CLASP
The Centre for Linguistic Theory and Studies in Probability

Spatial Knowledge In Neural Language Models

Understanding and generating spatial descriptions requires, among other things, knowledge about how objects are related geometrically. The wide usage of neural language models in different areas, including in generation of scene descriptions, motivates our study how spatial geometric knowledge is encoded in them. We first examine how spatial descriptions are attended by state of the art model of attention in CNNs. We argue that adaptive attention is good at predicting what the objects are but less good on how they relate geometrically. Then we explore different models of encoding explicit spatial information in an end-to-end scene description model. We summarize with the implications of this work for improving image captioning system.