Picture for Zhiheng Xi

Zhiheng Xi

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Add code
Nov 25, 2024
Figure 1 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 2 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 3 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Figure 4 for Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Viaarxiv icon

Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

Add code
Nov 01, 2024
Viaarxiv icon

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Add code
Oct 24, 2024
Figure 1 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Figure 2 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Figure 3 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Figure 4 for Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Viaarxiv icon

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

Add code
Oct 15, 2024
Figure 1 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Figure 2 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Figure 3 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Figure 4 for Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Viaarxiv icon

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

Add code
Oct 13, 2024
Figure 1 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 2 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 3 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Figure 4 for RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Viaarxiv icon

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data

Add code
Aug 27, 2024
Viaarxiv icon

Toward Optimal LLM Alignments Using Two-Player Games

Add code
Jun 16, 2024
Viaarxiv icon

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Add code
Jun 06, 2024
Viaarxiv icon

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

Add code
Apr 01, 2024
Figure 1 for Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
Figure 2 for Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
Figure 3 for Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
Figure 4 for Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
Viaarxiv icon

Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals

Add code
Mar 24, 2024
Viaarxiv icon