Picture for Junyang Lin

Junyang Lin

additional authors not shown

Qwen2.5-VL Technical Report

Add code
Feb 19, 2025
Viaarxiv icon

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

Multi-Agent Collaboration for Multilingual Code Instruction Tuning

Add code
Feb 11, 2025
Viaarxiv icon

Qwen2.5-1M Technical Report

Add code
Jan 26, 2025
Viaarxiv icon

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Add code
Jan 24, 2025
Viaarxiv icon

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Add code
Jan 21, 2025
Viaarxiv icon

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Add code
Jan 13, 2025
Viaarxiv icon

Enabling Scalable Oversight via Self-Evolving Critic

Add code
Jan 10, 2025
Viaarxiv icon

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild

Add code
Jan 07, 2025
Figure 1 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 2 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 3 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 4 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Viaarxiv icon

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Add code
Jan 03, 2025
Viaarxiv icon