Multilingual Text Classification


Multilingual text classification is the process of categorizing text documents in multiple languages into predefined categories.

MOSLD-Bench: Multilingual Open-Set Learning and Discovery Benchmark for Text Categorization

Add code
Jan 19, 2026
Viaarxiv icon

BYOL: Bring Your Own Language Into LLMs

Add code
Jan 15, 2026
Viaarxiv icon

VoxCog: Towards End-to-End Multilingual Cognitive Impairment Classification through Dialectal Knowledge

Add code
Jan 12, 2026
Viaarxiv icon

Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training

Add code
Jan 13, 2026
Viaarxiv icon

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

Add code
Jan 06, 2026
Viaarxiv icon

Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies

Add code
Dec 16, 2025
Viaarxiv icon

Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB

Add code
Nov 14, 2025
Figure 1 for Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
Figure 2 for Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
Figure 3 for Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
Figure 4 for Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
Viaarxiv icon

Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

Add code
Nov 10, 2025
Viaarxiv icon

BUSTED at AraGenEval Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection

Add code
Oct 23, 2025
Viaarxiv icon

Qwen3Guard Technical Report

Add code
Oct 16, 2025
Viaarxiv icon