Picture for Wenbo Su

Wenbo Su

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Viaarxiv icon

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Add code
Oct 28, 2024
Figure 1 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 2 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 3 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 4 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Viaarxiv icon

Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment

Add code
Oct 23, 2024
Viaarxiv icon

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Add code
Oct 15, 2024
Viaarxiv icon

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Add code
Jul 23, 2024
Viaarxiv icon

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

Add code
Jun 04, 2024
Viaarxiv icon

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

Add code
Jun 03, 2024
Viaarxiv icon

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

Add code
Feb 23, 2024
Figure 1 for ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
Figure 2 for ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
Figure 3 for ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
Figure 4 for ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
Viaarxiv icon