Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:IndoNLI: A Natural Language Inference Dataset for Indonesian

Oct 27, 2021

Rahmad Mahendra, Alham Fikri Aji, Samuel Louvan, Fahrurrozi Rahman, Clara Vania

Figure 1 for IndoNLI: A Natural Language Inference Dataset for Indonesian

Figure 2 for IndoNLI: A Natural Language Inference Dataset for Indonesian

Figure 3 for IndoNLI: A Natural Language Inference Dataset for Indonesian

Figure 4 for IndoNLI: A Natural Language Inference Dataset for Indonesian

Share this with someone who'll enjoy it:

Abstract:We present IndoNLI, the first human-elicited NLI dataset for Indonesian. We adapt the data collection protocol for MNLI and collect nearly 18K sentence pairs annotated by crowd workers and experts. The expert-annotated data is used exclusively as a test set. It is designed to provide a challenging test-bed for Indonesian NLI by explicitly incorporating various linguistic phenomena such as numerical reasoning, structural changes, idioms, or temporal and spatial reasoning. Experiment results show that XLM-R outperforms other pre-trained models in our data. The best performance on the expert-annotated data is still far below human performance (13.4% accuracy gap), suggesting that this test set is especially challenging. Furthermore, our analysis shows that our expert-annotated data is more diverse and contains fewer annotation artifacts than the crowd-annotated data. We hope this dataset can help accelerate progress in Indonesian NLP research.

* Accepted at EMNLP 2021 main conference

View paper on

Share this with someone who'll enjoy it:

Title:IndoNLI: A Natural Language Inference Dataset for Indonesian

Paper and Code