Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert P. Thompson

Code-Switched Language Identification is Harder Than You Think

Feb 02, 2024

Laurie Burchell, Alexandra Birch, Robert P. Thompson, Kenneth Heafield

Figure 1 for Code-Switched Language Identification is Harder Than You Think

Figure 2 for Code-Switched Language Identification is Harder Than You Think

Figure 3 for Code-Switched Language Identification is Harder Than You Think

Figure 4 for Code-Switched Language Identification is Harder Than You Think

Abstract:Code switching (CS) is a very common phenomenon in written and spoken communication but one that is handled poorly by many natural language processing applications. Looking to the application of building CS corpora, we explore CS language identification (LID) for corpus building. We make the task more realistic by scaling it to more languages and considering models with simpler architectures for faster inference. We also reformulate the task as a sentence-level multi-label tagging problem to make it more tractable. Having defined the task, we investigate three reasonable models for this task and define metrics which better reflect desired performance. We present empirical evidence that no current approach is adequate and finally provide recommendations for future work in this area.

* EACL 2024

Via

Access Paper or Ask Questions