collocated with the Eighth Swedish Language Technology Conference (SLTC), University of Gothenburg, Sweden
25th November 2020
The workshop will be held online.
All areas of natural language processing have achieved visible breakthroughs from the use of data-driven models. Contemporary machine learning is significantly influenced by techniques that rely on large datasets that demand substantial computational resources to solve practical problems in a tangible way (e.g. models based on transformers such as BERT, VilBERT, ALBERT, and GPT-2 that are pre-trained on large corpora of unlabelled data).
However, many of the world’s languages lack the availability of linguistic description as well as of sufficiently large computer-readable corpora of linguistic material. Even those languages that are considered well-resourced have some domains where resources are scarce, for example corpora of dialogue and situated interaction. Another similarity of these domains with under-resourced languages is that since they focus on spoken or spoken-like interaction (either in a written or an audio form) they show a high variability of input data. Applying state-of-the-art deep-neural-network-based methods for the development of data-driven systems in such resource-constrained environments is a non-trivial task.
For this workshop, we encourage contributions in the area of resource creation and representation learning in limited or low-resource environments that are tackling the above mentioned problems. In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
- How can new resources be constructed or extended for languages and domains that lack standardised representations of linguistic units?
- What experience from building resources for languages that have a good coverage today (for example Scandinavian languages) can be ported to building resources for under-resources languages and domains?
- How to deal with the variability of data and its standardisation in machine learning approaches?
- What algorithms and methods can we employ to transfer learning from related domains/languages that have good coverage?
- What is the role of multi-task learning in this domain?
- What representations can be learned and how effective are they in different low-resource scenarios?
- How can newly created resources and learned representations be evaluated?
- What ethical considerations are involved?
Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, computational linguistics, speech, machine learning etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and short presentations of on-going or completed research.
We invite submissions of 2-page extended non-anonymous abstracts with any number of pages for references using the ACL/EMNLP template. Papers related to our theme and already presented at other venues or have already been published elsewhere will be considered for acceptance for presentation as well. The abstracts will be reviewed by the workshop organisers and the accepted ones will be posted on the website, unless authors wish not to do so. There will be no workshop proceedings but post-proceedings may be organised depending on the interest of authors.
Extended abstracts should be submitted in the pdf format at https://easychair.org/conferences/?conf=resourceful2020
- Submission of extended abstracts: 29th September 2020
- Notification of acceptance: 23rd October 2020
- Final version: 10th November 2020
- Workshop date: 25th November 2020
All times are 11:59PM UTC-12:00 (“anywhere on Earth”).