Deep learning and severely under-resourced languages: How much can the the model actually learn?

Presented by: Rolando Coto Solano from Dartmouth College
Date: March 23, 2023

Abstract: How do deep learning models behave when faced with truly low-resource languages? We will attempt to define what a "low-resource" language is, and we will look at examples of learning techniques such as cross-lingual approaches that do help in the learning of dramatically small datasets. By exploring speech recognition, parsing and machine translation, we will look at algorithms that work and algorithms that break under such conditions. We will also discuss the many differences in the nature of low-resource data, and how people go looking for data in the wrong places. Finally, we will discuss techniques such as attention vector analysis that can help us probe into what models can be learning in such data-limited scenarios. We will provide examples from the languages Bribri from Costa Rica and Cook Islands Māori from Polynesia.

Location: Attend in person at J330 or via Zoom, https://gu-se.zoom.us/j/66299274809?pwd=Yjc2ejc2VVhraXVJMmhWeWtOQ2NuUT09

Time: 13:15-15:00