Picture for Rui Men

Rui Men

additional authors not shown

HellaSwag-Pro: A Large-Scale Bilingual Benchmark for Evaluating the Robustness of LLMs in Commonsense Reasoning

Add code
Feb 17, 2025
Viaarxiv icon

Qwen2.5-1M Technical Report

Add code
Jan 26, 2025
Viaarxiv icon

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Add code
Jan 21, 2025
Viaarxiv icon

Qwen2.5 Technical Report

Add code
Dec 19, 2024
Viaarxiv icon

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Add code
Sep 18, 2024
Figure 1 for Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Figure 2 for Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Figure 3 for Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Figure 4 for Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Viaarxiv icon

Qwen2.5-Coder Technical Report

Add code
Sep 18, 2024
Figure 1 for Qwen2.5-Coder Technical Report
Figure 2 for Qwen2.5-Coder Technical Report
Figure 3 for Qwen2.5-Coder Technical Report
Figure 4 for Qwen2.5-Coder Technical Report
Viaarxiv icon

Qwen2 Technical Report

Add code
Jul 16, 2024
Figure 1 for Qwen2 Technical Report
Figure 2 for Qwen2 Technical Report
Figure 3 for Qwen2 Technical Report
Figure 4 for Qwen2 Technical Report
Viaarxiv icon

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Add code
Jun 07, 2024
Figure 1 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
Figure 2 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
Figure 3 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
Figure 4 for Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
Viaarxiv icon

Qwen Technical Report

Add code
Sep 28, 2023
Figure 1 for Qwen Technical Report
Figure 2 for Qwen Technical Report
Figure 3 for Qwen Technical Report
Figure 4 for Qwen Technical Report
Viaarxiv icon

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Add code
Dec 08, 2022
Figure 1 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 2 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 3 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 4 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Viaarxiv icon