Picture for Zhiheng Xi

Zhiheng Xi

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training

Add code
Feb 06, 2025
Viaarxiv icon

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Add code
Jan 20, 2025
Viaarxiv icon

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

Add code
Jan 07, 2025
Viaarxiv icon

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Add code
Nov 25, 2024
Figure 1 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 2 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 3 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 4 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Viaarxiv icon

Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

Add code
Nov 01, 2024
Viaarxiv icon

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Add code
Oct 24, 2024
Figure 1 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Figure 2 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Figure 3 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Figure 4 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Viaarxiv icon

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Add code
Oct 15, 2024
Figure 1 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Figure 2 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Figure 3 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Figure 4 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Viaarxiv icon

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

Add code
Oct 13, 2024
Figure 1 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 2 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 3 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 4 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Viaarxiv icon

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data

Add code
Aug 27, 2024
Viaarxiv icon

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon