Picture for Jiaheng Liu

Jiaheng Liu

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Viaarxiv icon

MdEval: Massively Multilingual Code Debugging

Add code
Nov 04, 2024
Figure 1 for MdEval: Massively Multilingual Code Debugging
Figure 2 for MdEval: Massively Multilingual Code Debugging
Figure 3 for MdEval: Massively Multilingual Code Debugging
Figure 4 for MdEval: Massively Multilingual Code Debugging
Viaarxiv icon

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Add code
Oct 29, 2024
Figure 1 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Figure 2 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Figure 3 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Figure 4 for AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Viaarxiv icon

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Add code
Oct 28, 2024
Figure 1 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 2 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 3 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 4 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Viaarxiv icon

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Add code
Oct 25, 2024
Viaarxiv icon

Aligning CodeLLMs with Direct Preference Optimization

Add code
Oct 24, 2024
Viaarxiv icon

Can MLLMs Understand the Deep Implication Behind Chinese Images?

Add code
Oct 17, 2024
Figure 1 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Figure 2 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Figure 3 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Figure 4 for Can MLLMs Understand the Deep Implication Behind Chinese Images?
Viaarxiv icon

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment

Add code
Oct 17, 2024
Viaarxiv icon

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Add code
Oct 15, 2024
Viaarxiv icon

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet

Add code
Oct 09, 2024
Figure 1 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 2 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 3 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Figure 4 for ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Viaarxiv icon