Picture for Tianyu Liu

Tianyu Liu

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Add code
Feb 20, 2025
Viaarxiv icon

Multi-Agent Collaboration for Multilingual Code Instruction Tuning

Add code
Feb 11, 2025
Viaarxiv icon

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Add code
Jan 24, 2025
Viaarxiv icon

Enabling Scalable Oversight via Self-Evolving Critic

Add code
Jan 10, 2025
Viaarxiv icon

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Add code
Dec 30, 2024
Figure 1 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 2 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 3 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 4 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Viaarxiv icon

CodeV: Issue Resolving with Visual Data

Add code
Dec 23, 2024
Viaarxiv icon

ExecRepoBench: Multi-level Executable Code Completion Evaluation

Add code
Dec 16, 2024
Figure 1 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 2 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 3 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 4 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Viaarxiv icon

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

Add code
Nov 26, 2024
Figure 1 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 2 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 3 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 4 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Viaarxiv icon

Likelihood as a Performance Gauge for Retrieval-Augmented Generation

Add code
Nov 12, 2024
Viaarxiv icon