CLASP
The Centre for Linguistic Theory and Studies in Probability

What does BERT know about words? Unveiling hidden lexical semantic properties

Given the high performance of pre-trained language models on natural language understanding tasks, an important strand of work has focused on the linguistic knowledge encoded inside the models, mainly addressing structure-related aspects. In our work, we explore the knowledge BERT encodes about lexical semantics. We specifically probe BERT representations for lexical polysemy detection, scalar adjective ranking and noun property prediction. We perform intrinsic evaluations against hand-crafted data, and test the extracted representations on the tasks of indirect question-answering and in-context lexical entailment. We show that the model encodes rich information about polysemy and adjective intensity, acquired through pre-training, but has only marginal knowledge of noun properties and their prevalence.

The presented work has been performed in collaboration with my PhD student, Aina GarĂ­ Soler (University Paris-Saclay), in the frame of the MULTISEM ANR project.