Picture for Weerayut Buaphet

Weerayut Buaphet

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data

Add code
Jan 25, 2026
Viaarxiv icon

MultiLexNorm++: A Unified Benchmark and a Generative Model for Lexical Normalization for Asian Languages

Add code
Jan 23, 2026
Viaarxiv icon

Seed-Free Synthetic Data Generation Framework for Instruction-Tuning LLMs: A Case Study in Thai

Add code
Nov 23, 2024
Viaarxiv icon