Picture for Wenbo Su

Wenbo Su

WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis

Add code
Dec 04, 2024
Figure 1 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 2 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 3 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 4 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Viaarxiv icon

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Add code
Oct 28, 2024
Figure 1 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 2 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 3 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 4 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Figure 1 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 2 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 3 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Figure 4 for 2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Viaarxiv icon

Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment

Add code
Oct 23, 2024
Figure 1 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 2 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 3 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Figure 4 for Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment
Viaarxiv icon

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Add code
Oct 15, 2024
Figure 1 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 2 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 3 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 4 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Viaarxiv icon

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Add code
Jul 23, 2024
Figure 1 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 2 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 3 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 4 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Viaarxiv icon

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

Add code
Jun 04, 2024
Figure 1 for R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
Figure 2 for R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
Figure 3 for R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
Figure 4 for R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
Viaarxiv icon

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

Add code
Jun 03, 2024
Viaarxiv icon