Picture for Weixiang Yan

Weixiang Yan

CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?

Add code
Aug 20, 2024
Viaarxiv icon

Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

Add code
Jul 10, 2024
Viaarxiv icon

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

Add code
Jun 19, 2024
Viaarxiv icon

CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification

Add code
Apr 30, 2024
Viaarxiv icon

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

Add code
Nov 14, 2023
Viaarxiv icon

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Add code
Oct 08, 2023
Viaarxiv icon

Enhancing Generation through Summarization Duality and Explicit Outline Control

Add code
May 23, 2023
Viaarxiv icon