Picture for Xueren Zhang

Xueren Zhang

DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data Cleaning as Anomaly Detection

Add code
Feb 17, 2025
Viaarxiv icon