Composing Byte-Pair Encodings for Morphological Sequence Classification
- Event: Seminar
- Lecturer: Adam Ek
- Date: 11 November 2020
- Duration: 2 hours
- Venue: Gothenburg
In this talk I’ll present research regarding composing sub-word representations, specifically representations obtained for byte-pair tokens by a large language model, into word representations. In our paper, we evaluate four different methods of obtaining word representations for morphological sequence classification, that is, the task of assigning grammatical features to words. Our experiments reveal that using an RNN to compute word representations is consistently more effective than the other three methods across a sample of eight languages with different typology and varying number of byte-pair tokens per word.