12/9/2016

Translator Gator: Gamification as a tool to crowdsource scarce language data

 

Bahasa Indonesia, the country's national language, is spoken as the mother tongue by a mere 7% of its population. And the country boasts over 300 national languages. Although rich in its cultural endowments. such linguistic diversity is problematic for policymakers looking to gauge citizen feedback. Without a taxonomy of keywords to monitor, researchers are limited in their capacity to monitor public discourse.

One solution to this problem, developed by UN Global Pulse Jakarta, is Translator Gator - a mobile language game that crowdsources translations of words from English into any of Indonesia's six most common languages.

translator.jpg

Image: Translator Gator.

"Translator Gator is inspired by the need to socialise the 17 Sustainable Development Goals (SDGs), currently being integrated into the Government of Indonesia’s programme, and the need to better monitor progress against the varied indicators. Thus, Translator Gator will raise awareness of the SDGs and develop a taxonomy of keywords to inform research," explains the team at UN Global Pulse Jakarta.

There are four components of the game:

  1. Translate SDG-related words and phrases in pre-defined English into everyday Indonesian words and phrases
  2. Evaluate the translations submitted by others to validate the meaning
  3. Suggest synonyms
  4. Classify words and phrases into selected categories.

Throughout the game, users collect points by playing with these four components. These points can then be redeemed for phone credit.

Data collected

Since early January, Translator Gator has received over 109,000 user contributions, creating a wealth of language data.

A major driver behind the project's successful data collection, as revealed by user surveys, was the phone credit incentive for playing. Based on the dictionary completion rates, highlighted in the below graph, different incentives are being considered to increase data collection for less common languages in future phases.

dictionarycom.png

Image: Dictionary completion rates.

Even without complete datasets for all six languages, Translator Gator data has already begun to support research.

In June, a two-day research dive was held to assess the quality of translations, visualize data, and fill in any data gaps. Examples of areas explored at the event include using the datasets to construct a synonym dictionary in Bahasa Indonesia, and using meta-information to make inferences about translation accuracy.

Visit the Translator Gator project page here.

Comments