Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multilingual Controlled Generation And Gold-Standard-Agnostic Evaluation of Code-Mixed Sentences

Oct 14, 2024

Ayushman Gupta, Akhil Bhogal, Kripabandhu Ghosh

Figure 1 for Multilingual Controlled Generation And Gold-Standard-Agnostic Evaluation of Code-Mixed Sentences

Figure 2 for Multilingual Controlled Generation And Gold-Standard-Agnostic Evaluation of Code-Mixed Sentences

Figure 3 for Multilingual Controlled Generation And Gold-Standard-Agnostic Evaluation of Code-Mixed Sentences

Figure 4 for Multilingual Controlled Generation And Gold-Standard-Agnostic Evaluation of Code-Mixed Sentences

Share this with someone who'll enjoy it:

Abstract:Code-mixing, the practice of alternating between two or more languages in an utterance, is a common phenomenon in multilingual communities. Due to the colloquial nature of code-mixing, there is no singular correct way to translate an English sentence into a code-mixed sentence. For this reason, standard n-gram-based MT evaluation metrics such as the BLEU score are not appropriate for code-mixed evaluation. To demonstrate this, we propose a novel method for code-mixed text generation: Controlled Generation, which parameterizes the code-mixing degree (CMD) and enables the generation of multiple semantically equivalent code-mixed sentences from a given English sentence. We introduce a robust new evaluation metric: GAME: A Gold-Standard Agnostic Measure for Evaluation of Code-Mixed Sentences. GAME is both language-agnostic and gold-standard-agnostic, i.e. unlike other metrics, GAME does not require gold-standard code-mixed sentences for evaluation, thus eliminating the need for human annotators in the code-mixed evaluation process. When used to evaluate semantically equivalent code-mixed sentences, we find that GAME scores have a lower standard deviation than BLEU scores. Further, we create and release a dataset containing gold-standard code-mixed sentences across 4 language pairs: English-{Hindi, Bengali, French, Spanish} to encourage more computational research on code-mixing.

* Manuscript submitted to COLING 2025

View paper on

Share this with someone who'll enjoy it:

Title:Multilingual Controlled Generation And Gold-Standard-Agnostic Evaluation of Code-Mixed Sentences

Paper and Code