Picture for Yancheng He

Yancheng He

ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models

Add code
Feb 27, 2025
Viaarxiv icon

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

Add code
Feb 26, 2025
Viaarxiv icon

AIR: Complex Instruction Generation via Automatic Iterative Refinement

Add code
Feb 25, 2025
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Viaarxiv icon

ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Add code
Feb 19, 2025
Viaarxiv icon

MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training

Add code
Feb 17, 2025
Viaarxiv icon

Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models

Add code
Dec 23, 2024
Figure 1 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 2 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 3 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Figure 4 for Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
Viaarxiv icon

Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation

Add code
Dec 19, 2024
Viaarxiv icon

WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis

Add code
Dec 04, 2024
Figure 1 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 2 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 3 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Figure 4 for WiS Platform: Enhancing Evaluation of LLM-Based Multi-Agent Systems Through Game-Based Analysis
Viaarxiv icon

Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models

Add code
Nov 13, 2024
Figure 1 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 2 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 3 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Figure 4 for Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
Viaarxiv icon