CLASP
The Centre for Linguistic Theory and Studies in Probability

Can LLMs process Indigenous Languages? An Exploration of AI for language documentation with Bribri and Cook Islands Māori

Abstract

The performance of Large Language Models (LLMs) for tasks involved in language documentation, such as transcription, translation and data analysis, is high when it comes to widely spoken languages. However, LLMs continue to show gaps in reliability with languages at the lower end of the resource spectrum. Here we will explore the performance of cutting-edge LLMs in documentation tasks with two languages: Bribri (Costa Rica) and Cook Islands Māori. We will also revisit efforts to increase not just their documentation and increase AI performance, but also to increase the role of community members in the documentation efforts.